1 MVP

We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary.

We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!

As part of the MVP we want you not to just run the code but also have a go at intepreting the results and write your thinking in comments in your script.

Hints and tips

  • region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!)
  • Think about whether each variable is categorical or numerical. If categorical, make sure that the variable is represented as a factor.
  • We will not treat this data as a time series, so Date will not be needed in your models, but can you extract any useful features out of Date before you discard it?
  • If you want to build a predictive model, consider using either leaps or glmulti to help with this.

2 Manual Approach : Explanatory Model

2.1 Data & Data Cleaning


Load libraries:

library(tidyverse)
library(GGally)
library(modelr)
library(janitor)

Load dataset and examine it:

avocados <- clean_names(read_csv("data/avocado.csv"))
head(avocados)


Ok, we have 14 variables. Can already see that some of them are somewhat useless (x1 for example). Not sure whether the total_bags variable is the sum of small_bags, large_bags and x_large_bags so I’ll check that first.


# check to see if total_bags variable is just the sum of the other three
avocados %>%
  mutate(total_sum = small_bags + large_bags + x_large_bags) %>%
  select(total_bags, total_sum)


Yep, the total_bags column is just a sum of the other three. So this is a another variable I can get rid of. I can also check the same for volume:

# check to see if total_volume variable is just the sum of the other three
avocados %>%
  mutate(total_sum = x4046 + x4225 + x4770) %>%
  select(total_volume, total_sum)


Nope, these aren’t the same, so we can keep all these in.


Now let’s check how many different levels of each categorical variable we have.


avocados %>%
  distinct(region) %>%
  summarise(number_of_regions = n())
avocados %>%
  distinct(date) %>%
  summarise(
    number_of_dates = n(),
    min_date = min(date),
    max_date = max(date)
  )


The region variable will lead to many categorical levels, but we can try leaving it in. We should also examine date and perhaps pull out from it whatever features we can. Including every single date would be too much, so we can extract the different parts of the date that might be useful. For example, we could try and split it into different quarters, or years.

So, let’s do this now. Remove the variables we don’t need, change our categorical variables to factors, and extract parts of the date in case they are useful (and get rid of date).


library(lubridate)
trimmed_avocados <- avocados %>%
  mutate(
    quarter = as_factor(quarter(date)),
    year = as_factor(year),
    type = as_factor(type),
    region = as_factor(region)
  ) %>%
  select(-c(x1, date,total_bags))


Now we’ve done our cleaning, we can check for aliased variables (i.e. combinations of variables in which one or more of the variables can be calculated exactly from other variables):


alias(average_price ~ ., data = trimmed_avocados )
## Model :
## average_price ~ total_volume + x4046 + x4225 + x4770 + small_bags + 
##     large_bags + x_large_bags + type + year + region + quarter

Nice, we don’t find any aliases. So we can keep going.


2.2 First variable

We need to decide on which variable we want to put in our model first. To do this, we should visualise it. Because we have so much data, ggpairs() might take a while to run, so we can split it up a bit.


# let's start by plotting the volume variables
trimmed_avocados %>%
  select(average_price, total_volume, x4046, x4225, x4770) %>%
  ggpairs() + 
   theme_grey(base_size = 8) # font size of labels


Hmm, these look highly correlated with one another in some instances. This is a sign that we won’t have to include all of these in our model, so we could think about removing x4225 and x4770 from our dataset to give ourselves fewer variables.

trimmed_avocados <- trimmed_avocados %>%
  select(-x4225, -x4770)


In terms of variables that correlate well with average_price… well none of them do, that well. But that’s life. Our x046 variable is probably our first candidate.

Next we can look at our volume variables.

trimmed_avocados %>%
  select(average_price, small_bags, large_bags, x_large_bags) %>%
  ggpairs() + 
   theme_grey(base_size = 8) # font size of labels


Hmm, again… not that promising. Some of the variables are highly correlated with one another, but not much seems highly correlated with average_price.


We can look at some of our categorical variables next:


trimmed_avocados %>%
  select(average_price, type, year, quarter) %>%
  ggpairs() + 
   theme_grey(base_size = 8) # font size of labels


This seems better! Our type variable seems to show variation in the boxplots. This might suggest that conventional avocados and organic ones have different prices (which again, makes sense).

Finally, we can make a boxplot of our region variable. Because this has so many levels, it makes sense to plot it by itself so we can see it.


trimmed_avocados %>%
  ggplot(aes(x = region, y = average_price)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))


Ok, seems there is some variation in the boxplots between different regions, so that seems like it could be promising.


Let’s start by test competing models. We decided that x4046, type, and region seemed reasonable:


library(ggfortify)

# build the model 
model1a <- lm(average_price ~ x4046, data = trimmed_avocados)

# check the diagnostics
autoplot(model1a)

# check the summary output
summary(model1a)
## 
## Call:
## lm(formula = average_price ~ x4046, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.98539 -0.29842 -0.03531  0.25459  1.82475 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.425e+00  2.993e-03  476.29   <2e-16 ***
## x4046       -6.631e-08  2.305e-09  -28.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3939 on 18247 degrees of freedom
## Multiple R-squared:  0.0434, Adjusted R-squared:  0.04334 
## F-statistic: 827.8 on 1 and 18247 DF,  p-value: < 2.2e-16
# build the model 
model1b <- lm(average_price ~ type, data = trimmed_avocados)

# check the diagnostics
autoplot(model1b)

# check the summary output
summary(model1b)
## 
## Call:
## lm(formula = average_price ~ type, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21400 -0.20400 -0.02804  0.18600  1.59600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.158040   0.003321   348.7   <2e-16 ***
## typeorganic 0.495959   0.004697   105.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3173 on 18247 degrees of freedom
## Multiple R-squared:  0.3793, Adjusted R-squared:  0.3792 
## F-statistic: 1.115e+04 on 1 and 18247 DF,  p-value: < 2.2e-16
# build the model 
model1c <- lm(average_price ~ region, data = trimmed_avocados)

# check the diagnostics
autoplot(model1c)

# check the summary output
summary(model1c)
## 
## Call:
## lm(formula = average_price ~ region, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97095 -0.28423 -0.03432  0.25207  1.76115 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.561036   0.020006  78.029  < 2e-16 ***
## regionAtlanta             -0.223077   0.028293  -7.885 3.33e-15 ***
## regionBaltimoreWashington -0.026805   0.028293  -0.947  0.34344    
## regionBoise               -0.212899   0.028293  -7.525 5.52e-14 ***
## regionBoston              -0.030148   0.028293  -1.066  0.28663    
## regionBuffaloRochester    -0.044201   0.028293  -1.562  0.11824    
## regionCalifornia          -0.165710   0.028293  -5.857 4.79e-09 ***
## regionCharlotte            0.045000   0.028293   1.591  0.11173    
## regionChicago             -0.004260   0.028293  -0.151  0.88031    
## regionCincinnatiDayton    -0.351834   0.028293 -12.436  < 2e-16 ***
## regionColumbus            -0.308254   0.028293 -10.895  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.028293 -16.805  < 2e-16 ***
## regionDenver              -0.342456   0.028293 -12.104  < 2e-16 ***
## regionDetroit             -0.284941   0.028293 -10.071  < 2e-16 ***
## regionGrandRapids         -0.056036   0.028293  -1.981  0.04765 *  
## regionGreatLakes          -0.222485   0.028293  -7.864 3.94e-15 ***
## regionHarrisburgScranton  -0.047751   0.028293  -1.688  0.09147 .  
## regionHartfordSpringfield  0.257604   0.028293   9.105  < 2e-16 ***
## regionHouston             -0.513107   0.028293 -18.136  < 2e-16 ***
## regionIndianapolis        -0.247041   0.028293  -8.732  < 2e-16 ***
## regionJacksonville        -0.050089   0.028293  -1.770  0.07668 .  
## regionLasVegas            -0.180118   0.028293  -6.366 1.98e-10 ***
## regionLosAngeles          -0.345030   0.028293 -12.195  < 2e-16 ***
## regionLouisville          -0.274349   0.028293  -9.697  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.028293  -4.685 2.82e-06 ***
## regionMidsouth            -0.156272   0.028293  -5.523 3.37e-08 ***
## regionNashville           -0.348935   0.028293 -12.333  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.028293  -9.057  < 2e-16 ***
## regionNewYork              0.166538   0.028293   5.886 4.02e-09 ***
## regionNortheast            0.040888   0.028293   1.445  0.14843    
## regionNorthernNewEngland  -0.083639   0.028293  -2.956  0.00312 ** 
## regionOrlando             -0.054822   0.028293  -1.938  0.05268 .  
## regionPhiladelphia         0.071095   0.028293   2.513  0.01199 *  
## regionPhoenixTucson       -0.336598   0.028293 -11.897  < 2e-16 ***
## regionPittsburgh          -0.196716   0.028293  -6.953 3.70e-12 ***
## regionPlains              -0.124527   0.028293  -4.401 1.08e-05 ***
## regionPortland            -0.243314   0.028293  -8.600  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.028293  -0.209  0.83434    
## regionRichmondNorfolk     -0.269704   0.028293  -9.533  < 2e-16 ***
## regionRoanoke             -0.313107   0.028293 -11.067  < 2e-16 ***
## regionSacramento           0.060533   0.028293   2.140  0.03241 *  
## regionSanDiego            -0.162870   0.028293  -5.757 8.72e-09 ***
## regionSanFrancisco         0.243166   0.028293   8.595  < 2e-16 ***
## regionSeattle             -0.118462   0.028293  -4.187 2.84e-05 ***
## regionSouthCarolina       -0.157751   0.028293  -5.576 2.50e-08 ***
## regionSouthCentral        -0.459793   0.028293 -16.251  < 2e-16 ***
## regionSoutheast           -0.163018   0.028293  -5.762 8.45e-09 ***
## regionSpokane             -0.115444   0.028293  -4.080 4.52e-05 ***
## regionStLouis             -0.130414   0.028293  -4.609 4.06e-06 ***
## regionSyracuse            -0.040710   0.028293  -1.439  0.15020    
## regionTampa               -0.152189   0.028293  -5.379 7.58e-08 ***
## regionTotalUS             -0.242012   0.028293  -8.554  < 2e-16 ***
## regionWest                -0.288817   0.028293 -10.208  < 2e-16 ***
## regionWestTexNewMexico    -0.299334   0.028356 -10.556  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3678 on 18195 degrees of freedom
## Multiple R-squared:  0.1681, Adjusted R-squared:  0.1657 
## F-statistic: 69.38 on 53 and 18195 DF,  p-value: < 2.2e-16


model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region because it’s too big).


2.3 Second variable


avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model1b) %>%
  select(-c("average_price", "type", "region"))

ggpairs(avocados_remaining_resid) + 
  theme_grey(base_size = 8) # this bit just changes the axis label font size so we can see


Again, this isn’t showing any really high correlations between the residuals and any of our numeric variables. Looks like x4046, year, quarter could show something potentially (given the rubbish variables we have).


trimmed_avocados %>%
  add_residuals(model1b) %>%
  ggplot(aes(x = region, y = resid)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))


Looks like region are our next contenders to try. Let’s do these now.


model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
autoplot(model2a)

summary(model2a)
## 
## Call:
## lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21416 -0.20029 -0.02736  0.18591  1.59589 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.171e+00  3.485e-03  336.13   <2e-16 ***
## typeorganic  4.827e-01  4.802e-03  100.52   <2e-16 ***
## x4046       -2.323e-08  1.898e-09  -12.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.316 on 18246 degrees of freedom
## Multiple R-squared:  0.3843, Adjusted R-squared:  0.3843 
## F-statistic:  5695 on 2 and 18246 DF,  p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
autoplot(model2b)

summary(model2b)
## 
## Call:
## lm(formula = average_price ~ type + year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32320 -0.18722 -0.01722  0.18278  1.66337 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.127645   0.004704 239.735  < 2e-16 ***
## typeorganic  0.495980   0.004563 108.685  < 2e-16 ***
## year2016    -0.036995   0.005817  -6.360 2.07e-10 ***
## year2017     0.139580   0.005790  24.107  < 2e-16 ***
## year2018    -0.028104   0.009499  -2.959  0.00309 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3082 on 18244 degrees of freedom
## Multiple R-squared:  0.4142, Adjusted R-squared:  0.4141 
## F-statistic:  3225 on 4 and 18244 DF,  p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
autoplot(model2c)

summary(model2c)
## 
## Call:
## lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.11458 -0.20089 -0.02458  0.18542  1.54687 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.058626   0.004718  224.38   <2e-16 ***
## typeorganic 0.495958   0.004543  109.16   <2e-16 ***
## quarter2    0.068546   0.006282   10.91   <2e-16 ***
## quarter3    0.206308   0.006281   32.84   <2e-16 ***
## quarter4    0.152040   0.006237   24.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3069 on 18244 degrees of freedom
## Multiple R-squared:  0.4193, Adjusted R-squared:  0.4192 
## F-statistic:  3294 on 4 and 18244 DF,  p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
autoplot(model2d)

summary(model2d)
## 
## Call:
## lm(formula = average_price ~ type + region, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.09858 -0.16716 -0.01814  0.14692  1.51320 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.313079   0.014894  88.159  < 2e-16 ***
## typeorganic                0.495912   0.004017 123.452  < 2e-16 ***
## regionAtlanta             -0.223077   0.020871 -10.688  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.020871  -1.284  0.19906    
## regionBoise               -0.212899   0.020871 -10.201  < 2e-16 ***
## regionBoston              -0.030148   0.020871  -1.444  0.14863    
## regionBuffaloRochester    -0.044201   0.020871  -2.118  0.03421 *  
## regionCalifornia          -0.165710   0.020871  -7.940 2.15e-15 ***
## regionCharlotte            0.045000   0.020871   2.156  0.03109 *  
## regionChicago             -0.004260   0.020871  -0.204  0.83826    
## regionCincinnatiDayton    -0.351834   0.020871 -16.857  < 2e-16 ***
## regionColumbus            -0.308254   0.020871 -14.769  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.020871 -22.780  < 2e-16 ***
## regionDenver              -0.342456   0.020871 -16.408  < 2e-16 ***
## regionDetroit             -0.284941   0.020871 -13.652  < 2e-16 ***
## regionGrandRapids         -0.056036   0.020871  -2.685  0.00726 ** 
## regionGreatLakes          -0.222485   0.020871 -10.660  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.020871  -2.288  0.02216 *  
## regionHartfordSpringfield  0.257604   0.020871  12.342  < 2e-16 ***
## regionHouston             -0.513107   0.020871 -24.584  < 2e-16 ***
## regionIndianapolis        -0.247041   0.020871 -11.836  < 2e-16 ***
## regionJacksonville        -0.050089   0.020871  -2.400  0.01641 *  
## regionLasVegas            -0.180118   0.020871  -8.630  < 2e-16 ***
## regionLosAngeles          -0.345030   0.020871 -16.531  < 2e-16 ***
## regionLouisville          -0.274349   0.020871 -13.145  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.020871  -6.351 2.20e-10 ***
## regionMidsouth            -0.156272   0.020871  -7.487 7.35e-14 ***
## regionNashville           -0.348935   0.020871 -16.718  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.020871 -12.277  < 2e-16 ***
## regionNewYork              0.166538   0.020871   7.979 1.56e-15 ***
## regionNortheast            0.040888   0.020871   1.959  0.05013 .  
## regionNorthernNewEngland  -0.083639   0.020871  -4.007 6.16e-05 ***
## regionOrlando             -0.054822   0.020871  -2.627  0.00863 ** 
## regionPhiladelphia         0.071095   0.020871   3.406  0.00066 ***
## regionPhoenixTucson       -0.336598   0.020871 -16.127  < 2e-16 ***
## regionPittsburgh          -0.196716   0.020871  -9.425  < 2e-16 ***
## regionPlains              -0.124527   0.020871  -5.966 2.47e-09 ***
## regionPortland            -0.243314   0.020871 -11.658  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.020871  -0.284  0.77679    
## regionRichmondNorfolk     -0.269704   0.020871 -12.922  < 2e-16 ***
## regionRoanoke             -0.313107   0.020871 -15.002  < 2e-16 ***
## regionSacramento           0.060533   0.020871   2.900  0.00373 ** 
## regionSanDiego            -0.162870   0.020871  -7.803 6.35e-15 ***
## regionSanFrancisco         0.243166   0.020871  11.651  < 2e-16 ***
## regionSeattle             -0.118462   0.020871  -5.676 1.40e-08 ***
## regionSouthCarolina       -0.157751   0.020871  -7.558 4.28e-14 ***
## regionSouthCentral        -0.459793   0.020871 -22.030  < 2e-16 ***
## regionSoutheast           -0.163018   0.020871  -7.811 6.00e-15 ***
## regionSpokane             -0.115444   0.020871  -5.531 3.22e-08 ***
## regionStLouis             -0.130414   0.020871  -6.248 4.24e-10 ***
## regionSyracuse            -0.040710   0.020871  -1.951  0.05113 .  
## regionTampa               -0.152189   0.020871  -7.292 3.18e-13 ***
## regionTotalUS             -0.242012   0.020871 -11.595  < 2e-16 ***
## regionWest                -0.288817   0.020871 -13.838  < 2e-16 ***
## regionWestTexNewMexico    -0.297114   0.020918 -14.204  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2713 on 18194 degrees of freedom
## Multiple R-squared:  0.5473, Adjusted R-squared:  0.546 
## F-statistic: 407.4 on 54 and 18194 DF,  p-value: < 2.2e-16


So model2d with type and region comes out as better here. We have some region coefficients that are not significant at \(0.05\) level, so let’s run an anova() to test whether to include region


# model1b is the model with average_price ~ type
# model2d is the model with average_price ~ type + region

# we want to compare the two
anova(model1b, model2d)


It seems region is significant overall, so we’ll keep it in!


2.4 Third variable


Model2d is our model with average_price ~ type + region, and it explains 0.5473 of the variance in average price. This isn’t really very high, so we can think about adding a third predictor now. Again, we want to remove these variables from our data, and check the residuals.


avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model2d) %>%
  select(-c("average_price", "type", "region"))

ggpairs(avocados_remaining_resid) + 
   theme_grey(base_size = 8) # font size of labels


The next contender variables look to be x_large_bags, year and quarter. Let’s try them out.


model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
autoplot(model3a)

summary(model3a)
## 
## Call:
## lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.10024 -0.16726 -0.01734  0.14591  1.51156 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.311e+00  1.489e-02  88.033  < 2e-16 ***
## typeorganic                5.001e-01  4.101e-03 121.953  < 2e-16 ***
## regionAtlanta             -2.235e-01  2.086e-02 -10.718  < 2e-16 ***
## regionBaltimoreWashington -2.713e-02  2.086e-02  -1.301 0.193298    
## regionBoise               -2.128e-01  2.086e-02 -10.204  < 2e-16 ***
## regionBoston              -3.023e-02  2.086e-02  -1.449 0.147234    
## regionBuffaloRochester    -4.428e-02  2.086e-02  -2.123 0.033774 *  
## regionCalifornia          -1.762e-01  2.096e-02  -8.408  < 2e-16 ***
## regionCharlotte            4.495e-02  2.086e-02   2.155 0.031177 *  
## regionChicago             -4.936e-03  2.086e-02  -0.237 0.812924    
## regionCincinnatiDayton    -3.523e-01  2.086e-02 -16.890  < 2e-16 ***
## regionColumbus            -3.086e-01  2.086e-02 -14.796  < 2e-16 ***
## regionDallasFtWorth       -4.762e-01  2.086e-02 -22.832  < 2e-16 ***
## regionDenver              -3.425e-01  2.086e-02 -16.420  < 2e-16 ***
## regionDetroit             -2.882e-01  2.087e-02 -13.810  < 2e-16 ***
## regionGrandRapids         -5.764e-02  2.086e-02  -2.763 0.005731 ** 
## regionGreatLakes          -2.353e-01  2.101e-02 -11.198  < 2e-16 ***
## regionHarrisburgScranton  -4.798e-02  2.086e-02  -2.300 0.021451 *  
## regionHartfordSpringfield  2.575e-01  2.086e-02  12.347  < 2e-16 ***
## regionHouston             -5.137e-01  2.086e-02 -24.628  < 2e-16 ***
## regionIndianapolis        -2.475e-01  2.086e-02 -11.867  < 2e-16 ***
## regionJacksonville        -5.021e-02  2.086e-02  -2.407 0.016074 *  
## regionLasVegas            -1.801e-01  2.086e-02  -8.633  < 2e-16 ***
## regionLosAngeles          -3.532e-01  2.092e-02 -16.881  < 2e-16 ***
## regionLouisville          -2.745e-01  2.086e-02 -13.160  < 2e-16 ***
## regionMiamiFtLauderdale   -1.331e-01  2.086e-02  -6.380 1.81e-10 ***
## regionMidsouth            -1.590e-01  2.086e-02  -7.619 2.68e-14 ***
## regionNashville           -3.491e-01  2.086e-02 -16.736  < 2e-16 ***
## regionNewOrleansMobile    -2.572e-01  2.086e-02 -12.330  < 2e-16 ***
## regionNewYork              1.659e-01  2.086e-02   7.954 1.91e-15 ***
## regionNortheast            3.834e-02  2.086e-02   1.838 0.066151 .  
## regionNorthernNewEngland  -8.377e-02  2.086e-02  -4.017 5.93e-05 ***
## regionOrlando             -5.523e-02  2.086e-02  -2.648 0.008111 ** 
## regionPhiladelphia         7.097e-02  2.086e-02   3.403 0.000669 ***
## regionPhoenixTucson       -3.368e-01  2.086e-02 -16.149  < 2e-16 ***
## regionPittsburgh          -1.967e-01  2.086e-02  -9.433  < 2e-16 ***
## regionPlains              -1.267e-01  2.086e-02  -6.072 1.29e-09 ***
## regionPortland            -2.434e-01  2.086e-02 -11.669  < 2e-16 ***
## regionRaleighGreensboro   -6.021e-03  2.086e-02  -0.289 0.772828    
## regionRichmondNorfolk     -2.699e-01  2.086e-02 -12.939  < 2e-16 ***
## regionRoanoke             -3.132e-01  2.086e-02 -15.015  < 2e-16 ***
## regionSacramento           6.020e-02  2.086e-02   2.886 0.003904 ** 
## regionSanDiego            -1.631e-01  2.086e-02  -7.819 5.64e-15 ***
## regionSanFrancisco         2.428e-01  2.086e-02  11.642  < 2e-16 ***
## regionSeattle             -1.185e-01  2.086e-02  -5.682 1.35e-08 ***
## regionSouthCarolina       -1.581e-01  2.086e-02  -7.581 3.59e-14 ***
## regionSouthCentral        -4.650e-01  2.088e-02 -22.268  < 2e-16 ***
## regionSoutheast           -1.680e-01  2.088e-02  -8.046 9.10e-16 ***
## regionSpokane             -1.154e-01  2.086e-02  -5.531 3.22e-08 ***
## regionStLouis             -1.308e-01  2.086e-02  -6.270 3.69e-10 ***
## regionSyracuse            -4.071e-02  2.086e-02  -1.952 0.050993 .  
## regionTampa               -1.526e-01  2.086e-02  -7.315 2.68e-13 ***
## regionTotalUS             -2.852e-01  2.255e-02 -12.648  < 2e-16 ***
## regionWest                -2.904e-01  2.086e-02 -13.922  < 2e-16 ***
## regionWestTexNewMexico    -2.976e-01  2.090e-02 -14.238  < 2e-16 ***
## x_large_bags               6.810e-07  1.351e-07   5.040 4.70e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2711 on 18193 degrees of freedom
## Multiple R-squared:  0.548,  Adjusted R-squared:  0.5466 
## F-statistic:   401 on 55 and 18193 DF,  p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
autoplot(model3b)

summary(model3b)
## 
## Call:
## lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1532 -0.1497 -0.0060  0.1419  1.4849 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.282672   0.014600  87.857  < 2e-16 ***
## typeorganic                0.495933   0.003859 128.501  < 2e-16 ***
## regionAtlanta             -0.223077   0.020052 -11.125  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.020052  -1.337 0.181322    
## regionBoise               -0.212899   0.020052 -10.617  < 2e-16 ***
## regionBoston              -0.030148   0.020052  -1.503 0.132735    
## regionBuffaloRochester    -0.044201   0.020052  -2.204 0.027515 *  
## regionCalifornia          -0.165710   0.020052  -8.264  < 2e-16 ***
## regionCharlotte            0.045000   0.020052   2.244 0.024835 *  
## regionChicago             -0.004260   0.020052  -0.212 0.831748    
## regionCincinnatiDayton    -0.351834   0.020052 -17.546  < 2e-16 ***
## regionColumbus            -0.308254   0.020052 -15.373  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.020052 -23.710  < 2e-16 ***
## regionDenver              -0.342456   0.020052 -17.078  < 2e-16 ***
## regionDetroit             -0.284941   0.020052 -14.210  < 2e-16 ***
## regionGrandRapids         -0.056036   0.020052  -2.794 0.005204 ** 
## regionGreatLakes          -0.222485   0.020052 -11.095  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.020052  -2.381 0.017259 *  
## regionHartfordSpringfield  0.257604   0.020052  12.847  < 2e-16 ***
## regionHouston             -0.513107   0.020052 -25.589  < 2e-16 ***
## regionIndianapolis        -0.247041   0.020052 -12.320  < 2e-16 ***
## regionJacksonville        -0.050089   0.020052  -2.498 0.012501 *  
## regionLasVegas            -0.180118   0.020052  -8.982  < 2e-16 ***
## regionLosAngeles          -0.345030   0.020052 -17.207  < 2e-16 ***
## regionLouisville          -0.274349   0.020052 -13.682  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.020052  -6.610 3.95e-11 ***
## regionMidsouth            -0.156272   0.020052  -7.793 6.88e-15 ***
## regionNashville           -0.348935   0.020052 -17.401  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.020052 -12.779  < 2e-16 ***
## regionNewYork              0.166538   0.020052   8.305  < 2e-16 ***
## regionNortheast            0.040888   0.020052   2.039 0.041459 *  
## regionNorthernNewEngland  -0.083639   0.020052  -4.171 3.05e-05 ***
## regionOrlando             -0.054822   0.020052  -2.734 0.006263 ** 
## regionPhiladelphia         0.071095   0.020052   3.545 0.000393 ***
## regionPhoenixTucson       -0.336598   0.020052 -16.786  < 2e-16 ***
## regionPittsburgh          -0.196716   0.020052  -9.810  < 2e-16 ***
## regionPlains              -0.124527   0.020052  -6.210 5.41e-10 ***
## regionPortland            -0.243314   0.020052 -12.134  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.020052  -0.295 0.767930    
## regionRichmondNorfolk     -0.269704   0.020052 -13.450  < 2e-16 ***
## regionRoanoke             -0.313107   0.020052 -15.615  < 2e-16 ***
## regionSacramento           0.060533   0.020052   3.019 0.002542 ** 
## regionSanDiego            -0.162870   0.020052  -8.122 4.86e-16 ***
## regionSanFrancisco         0.243166   0.020052  12.127  < 2e-16 ***
## regionSeattle             -0.118462   0.020052  -5.908 3.53e-09 ***
## regionSouthCarolina       -0.157751   0.020052  -7.867 3.83e-15 ***
## regionSouthCentral        -0.459793   0.020052 -22.930  < 2e-16 ***
## regionSoutheast           -0.163018   0.020052  -8.130 4.58e-16 ***
## regionSpokane             -0.115444   0.020052  -5.757 8.69e-09 ***
## regionStLouis             -0.130414   0.020052  -6.504 8.04e-11 ***
## regionSyracuse            -0.040710   0.020052  -2.030 0.042350 *  
## regionTampa               -0.152189   0.020052  -7.590 3.36e-14 ***
## regionTotalUS             -0.242012   0.020052 -12.069  < 2e-16 ***
## regionWest                -0.288817   0.020052 -14.403  < 2e-16 ***
## regionWestTexNewMexico    -0.296552   0.020097 -14.756  < 2e-16 ***
## year2016                  -0.036970   0.004920  -7.515 5.96e-14 ***
## year2017                   0.139555   0.004897  28.500  < 2e-16 ***
## year2018                  -0.028078   0.008033  -3.495 0.000475 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2607 on 18191 degrees of freedom
## Multiple R-squared:  0.5822, Adjusted R-squared:  0.5809 
## F-statistic: 444.8 on 57 and 18191 DF,  p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
autoplot(model3c)

summary(model3c)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06767 -0.15971 -0.01185  0.14629  1.54411 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.213689   0.014517  83.603  < 2e-16 ***
## typeorganic                0.495911   0.003835 129.296  < 2e-16 ***
## regionAtlanta             -0.223077   0.019928 -11.194  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.019928  -1.345 0.178619    
## regionBoise               -0.212899   0.019928 -10.683  < 2e-16 ***
## regionBoston              -0.030148   0.019928  -1.513 0.130339    
## regionBuffaloRochester    -0.044201   0.019928  -2.218 0.026565 *  
## regionCalifornia          -0.165710   0.019928  -8.315  < 2e-16 ***
## regionCharlotte            0.045000   0.019928   2.258 0.023950 *  
## regionChicago             -0.004260   0.019928  -0.214 0.830716    
## regionCincinnatiDayton    -0.351834   0.019928 -17.655  < 2e-16 ***
## regionColumbus            -0.308254   0.019928 -15.468  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.019928 -23.858  < 2e-16 ***
## regionDenver              -0.342456   0.019928 -17.185  < 2e-16 ***
## regionDetroit             -0.284941   0.019928 -14.298  < 2e-16 ***
## regionGrandRapids         -0.056036   0.019928  -2.812 0.004931 ** 
## regionGreatLakes          -0.222485   0.019928 -11.164  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.019928  -2.396 0.016577 *  
## regionHartfordSpringfield  0.257604   0.019928  12.927  < 2e-16 ***
## regionHouston             -0.513107   0.019928 -25.748  < 2e-16 ***
## regionIndianapolis        -0.247041   0.019928 -12.397  < 2e-16 ***
## regionJacksonville        -0.050089   0.019928  -2.513 0.011963 *  
## regionLasVegas            -0.180118   0.019928  -9.038  < 2e-16 ***
## regionLosAngeles          -0.345030   0.019928 -17.314  < 2e-16 ***
## regionLouisville          -0.274349   0.019928 -13.767  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.019928  -6.651 2.99e-11 ***
## regionMidsouth            -0.156272   0.019928  -7.842 4.69e-15 ***
## regionNashville           -0.348935   0.019928 -17.510  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.019928 -12.858  < 2e-16 ***
## regionNewYork              0.166538   0.019928   8.357  < 2e-16 ***
## regionNortheast            0.040888   0.019928   2.052 0.040208 *  
## regionNorthernNewEngland  -0.083639   0.019928  -4.197 2.72e-05 ***
## regionOrlando             -0.054822   0.019928  -2.751 0.005947 ** 
## regionPhiladelphia         0.071095   0.019928   3.568 0.000361 ***
## regionPhoenixTucson       -0.336598   0.019928 -16.891  < 2e-16 ***
## regionPittsburgh          -0.196716   0.019928  -9.871  < 2e-16 ***
## regionPlains              -0.124527   0.019928  -6.249 4.23e-10 ***
## regionPortland            -0.243314   0.019928 -12.210  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.019928  -0.297 0.766527    
## regionRichmondNorfolk     -0.269704   0.019928 -13.534  < 2e-16 ***
## regionRoanoke             -0.313107   0.019928 -15.712  < 2e-16 ***
## regionSacramento           0.060533   0.019928   3.038 0.002389 ** 
## regionSanDiego            -0.162870   0.019928  -8.173 3.21e-16 ***
## regionSanFrancisco         0.243166   0.019928  12.202  < 2e-16 ***
## regionSeattle             -0.118462   0.019928  -5.944 2.82e-09 ***
## regionSouthCarolina       -0.157751   0.019928  -7.916 2.59e-15 ***
## regionSouthCentral        -0.459793   0.019928 -23.073  < 2e-16 ***
## regionSoutheast           -0.163018   0.019928  -8.180 3.02e-16 ***
## regionSpokane             -0.115444   0.019928  -5.793 7.03e-09 ***
## regionStLouis             -0.130414   0.019928  -6.544 6.14e-11 ***
## regionSyracuse            -0.040710   0.019928  -2.043 0.041082 *  
## regionTampa               -0.152189   0.019928  -7.637 2.33e-14 ***
## regionTotalUS             -0.242012   0.019928 -12.144  < 2e-16 ***
## regionWest                -0.288817   0.019928 -14.493  < 2e-16 ***
## regionWestTexNewMexico    -0.297141   0.019973 -14.877  < 2e-16 ***
## quarter2                   0.068479   0.005303  12.912  < 2e-16 ***
## quarter3                   0.206308   0.005303  38.906  < 2e-16 ***
## quarter4                   0.152007   0.005265  28.869  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2591 on 18191 degrees of freedom
## Multiple R-squared:  0.5874, Adjusted R-squared:  0.5861 
## F-statistic: 454.3 on 57 and 18191 DF,  p-value: < 2.2e-16

So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.


2.5 Fourth variable


Remember with two predictors, our R^2 variable was up at 0.5473. Now, with three predictors, we are at 0.5874. Ok, that seems reasonable as an improvement. So let’s see how much improvement we get by adding a fourth variable. Again, check the residuals to see which ones we should try add.


avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model3c) %>%
  select(-c("average_price", "type", "region", "quarter"))

ggpairs(avocados_remaining_resid) + 
   theme_grey(base_size = 8) # font size of labels


The contender variables here are x_large_bags and year, so let’s try them out.


model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
autoplot(model4a)

summary(model4a)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + x_large_bags, 
##     data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06889 -0.16013 -0.01154  0.14553  1.54291 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.212e+00  1.451e-02  83.493  < 2e-16 ***
## typeorganic                4.998e-01  3.916e-03 127.614  < 2e-16 ***
## regionAtlanta             -2.235e-01  1.992e-02 -11.222  < 2e-16 ***
## regionBaltimoreWashington -2.711e-02  1.992e-02  -1.361 0.173535    
## regionBoise               -2.128e-01  1.992e-02 -10.687  < 2e-16 ***
## regionBoston              -3.022e-02  1.992e-02  -1.518 0.129137    
## regionBuffaloRochester    -4.427e-02  1.992e-02  -2.223 0.026233 *  
## regionCalifornia          -1.753e-01  2.002e-02  -8.759  < 2e-16 ***
## regionCharlotte            4.495e-02  1.992e-02   2.257 0.024015 *  
## regionChicago             -4.877e-03  1.992e-02  -0.245 0.806549    
## regionCincinnatiDayton    -3.522e-01  1.992e-02 -17.686  < 2e-16 ***
## regionColumbus            -3.086e-01  1.992e-02 -15.494  < 2e-16 ***
## regionDallasFtWorth       -4.762e-01  1.992e-02 -23.908  < 2e-16 ***
## regionDenver              -3.425e-01  1.992e-02 -17.196  < 2e-16 ***
## regionDetroit             -2.879e-01  1.993e-02 -14.449  < 2e-16 ***
## regionGrandRapids         -5.750e-02  1.992e-02  -2.887 0.003898 ** 
## regionGreatLakes          -2.342e-01  2.006e-02 -11.671  < 2e-16 ***
## regionHarrisburgScranton  -4.796e-02  1.992e-02  -2.408 0.016054 *  
## regionHartfordSpringfield  2.575e-01  1.992e-02  12.931  < 2e-16 ***
## regionHouston             -5.136e-01  1.992e-02 -25.789  < 2e-16 ***
## regionIndianapolis        -2.475e-01  1.992e-02 -12.426  < 2e-16 ***
## regionJacksonville        -5.020e-02  1.992e-02  -2.521 0.011720 *  
## regionLasVegas            -1.801e-01  1.992e-02  -9.041  < 2e-16 ***
## regionLosAngeles          -3.524e-01  1.998e-02 -17.644  < 2e-16 ***
## regionLouisville          -2.745e-01  1.992e-02 -13.781  < 2e-16 ***
## regionMiamiFtLauderdale   -1.330e-01  1.992e-02  -6.679 2.47e-11 ***
## regionMidsouth            -1.587e-01  1.992e-02  -7.967 1.72e-15 ***
## regionNashville           -3.491e-01  1.992e-02 -17.527  < 2e-16 ***
## regionNewOrleansMobile    -2.571e-01  1.992e-02 -12.909  < 2e-16 ***
## regionNewYork              1.660e-01  1.992e-02   8.333  < 2e-16 ***
## regionNortheast            3.856e-02  1.992e-02   1.936 0.052939 .  
## regionNorthernNewEngland  -8.376e-02  1.992e-02  -4.206 2.61e-05 ***
## regionOrlando             -5.519e-02  1.992e-02  -2.771 0.005592 ** 
## regionPhiladelphia         7.098e-02  1.992e-02   3.564 0.000366 ***
## regionPhoenixTucson       -3.368e-01  1.992e-02 -16.911  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.992e-02  -9.879  < 2e-16 ***
## regionPlains              -1.265e-01  1.992e-02  -6.350 2.20e-10 ***
## regionPortland            -2.434e-01  1.992e-02 -12.220  < 2e-16 ***
## regionRaleighGreensboro   -6.012e-03  1.992e-02  -0.302 0.762753    
## regionRichmondNorfolk     -2.699e-01  1.992e-02 -13.549  < 2e-16 ***
## regionRoanoke             -3.132e-01  1.992e-02 -15.725  < 2e-16 ***
## regionSacramento           6.023e-02  1.992e-02   3.024 0.002497 ** 
## regionSanDiego            -1.631e-01  1.992e-02  -8.187 2.85e-16 ***
## regionSanFrancisco         2.429e-01  1.992e-02  12.194  < 2e-16 ***
## regionSeattle             -1.185e-01  1.992e-02  -5.950 2.72e-09 ***
## regionSouthCarolina       -1.581e-01  1.992e-02  -7.938 2.18e-15 ***
## regionSouthCentral        -4.646e-01  1.994e-02 -23.297  < 2e-16 ***
## regionSoutheast           -1.676e-01  1.994e-02  -8.404  < 2e-16 ***
## regionSpokane             -1.154e-01  1.992e-02  -5.793 7.02e-09 ***
## regionStLouis             -1.307e-01  1.992e-02  -6.565 5.35e-11 ***
## regionSyracuse            -4.071e-02  1.992e-02  -2.044 0.040974 *  
## regionTampa               -1.525e-01  1.992e-02  -7.659 1.96e-14 ***
## regionTotalUS             -2.814e-01  2.153e-02 -13.068  < 2e-16 ***
## regionWest                -2.903e-01  1.992e-02 -14.573  < 2e-16 ***
## regionWestTexNewMexico    -2.976e-01  1.996e-02 -14.910  < 2e-16 ***
## quarter2                   6.806e-02  5.301e-03  12.839  < 2e-16 ***
## quarter3                   2.055e-01  5.302e-03  38.761  < 2e-16 ***
## quarter4                   1.527e-01  5.264e-03  29.001  < 2e-16 ***
## x_large_bags               6.215e-07  1.292e-07   4.810 1.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2589 on 18190 degrees of freedom
## Multiple R-squared:  0.5879, Adjusted R-squared:  0.5866 
## F-statistic: 447.4 on 58 and 18190 DF,  p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
autoplot(model4b)

summary(model4b)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year, 
##     data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03683 -0.14588 -0.00412  0.14386  1.43930 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.167184   0.014290  81.677  < 2e-16 ***
## typeorganic                0.495930   0.003675 134.950  < 2e-16 ***
## regionAtlanta             -0.223077   0.019094 -11.683  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.019094  -1.404 0.160383    
## regionBoise               -0.212899   0.019094 -11.150  < 2e-16 ***
## regionBoston              -0.030148   0.019094  -1.579 0.114368    
## regionBuffaloRochester    -0.044201   0.019094  -2.315 0.020627 *  
## regionCalifornia          -0.165710   0.019094  -8.679  < 2e-16 ***
## regionCharlotte            0.045000   0.019094   2.357 0.018445 *  
## regionChicago             -0.004260   0.019094  -0.223 0.823439    
## regionCincinnatiDayton    -0.351834   0.019094 -18.427  < 2e-16 ***
## regionColumbus            -0.308254   0.019094 -16.144  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.019094 -24.900  < 2e-16 ***
## regionDenver              -0.342456   0.019094 -17.935  < 2e-16 ***
## regionDetroit             -0.284941   0.019094 -14.923  < 2e-16 ***
## regionGrandRapids         -0.056036   0.019094  -2.935 0.003342 ** 
## regionGreatLakes          -0.222485   0.019094 -11.652  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.019094  -2.501 0.012397 *  
## regionHartfordSpringfield  0.257604   0.019094  13.491  < 2e-16 ***
## regionHouston             -0.513107   0.019094 -26.873  < 2e-16 ***
## regionIndianapolis        -0.247041   0.019094 -12.938  < 2e-16 ***
## regionJacksonville        -0.050089   0.019094  -2.623 0.008716 ** 
## regionLasVegas            -0.180118   0.019094  -9.433  < 2e-16 ***
## regionLosAngeles          -0.345030   0.019094 -18.070  < 2e-16 ***
## regionLouisville          -0.274349   0.019094 -14.368  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.019094  -6.942 4.00e-12 ***
## regionMidsouth            -0.156272   0.019094  -8.184 2.91e-16 ***
## regionNashville           -0.348935   0.019094 -18.275  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.019094 -13.420  < 2e-16 ***
## regionNewYork              0.166538   0.019094   8.722  < 2e-16 ***
## regionNortheast            0.040888   0.019094   2.141 0.032255 *  
## regionNorthernNewEngland  -0.083639   0.019094  -4.380 1.19e-05 ***
## regionOrlando             -0.054822   0.019094  -2.871 0.004094 ** 
## regionPhiladelphia         0.071095   0.019094   3.723 0.000197 ***
## regionPhoenixTucson       -0.336598   0.019094 -17.629  < 2e-16 ***
## regionPittsburgh          -0.196716   0.019094 -10.303  < 2e-16 ***
## regionPlains              -0.124527   0.019094  -6.522 7.13e-11 ***
## regionPortland            -0.243314   0.019094 -12.743  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.019094  -0.310 0.756641    
## regionRichmondNorfolk     -0.269704   0.019094 -14.125  < 2e-16 ***
## regionRoanoke             -0.313107   0.019094 -16.398  < 2e-16 ***
## regionSacramento           0.060533   0.019094   3.170 0.001526 ** 
## regionSanDiego            -0.162870   0.019094  -8.530  < 2e-16 ***
## regionSanFrancisco         0.243166   0.019094  12.735  < 2e-16 ***
## regionSeattle             -0.118462   0.019094  -6.204 5.62e-10 ***
## regionSouthCarolina       -0.157751   0.019094  -8.262  < 2e-16 ***
## regionSouthCentral        -0.459793   0.019094 -24.081  < 2e-16 ***
## regionSoutheast           -0.163018   0.019094  -8.538  < 2e-16 ***
## regionSpokane             -0.115444   0.019094  -6.046 1.51e-09 ***
## regionStLouis             -0.130414   0.019094  -6.830 8.75e-12 ***
## regionSyracuse            -0.040710   0.019094  -2.132 0.033011 *  
## regionTampa               -0.152189   0.019094  -7.971 1.67e-15 ***
## regionTotalUS             -0.242012   0.019094 -12.675  < 2e-16 ***
## regionWest                -0.288817   0.019094 -15.126  < 2e-16 ***
## regionWestTexNewMexico    -0.296624   0.019137 -15.500  < 2e-16 ***
## quarter2                   0.081121   0.005410  14.996  < 2e-16 ***
## quarter3                   0.218901   0.005409  40.471  < 2e-16 ***
## quarter4                   0.161972   0.005376  30.130  < 2e-16 ***
## year2016                  -0.036978   0.004684  -7.894 3.10e-15 ***
## year2017                   0.138658   0.004663  29.735  < 2e-16 ***
## year2018                   0.087412   0.008334  10.488  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2482 on 18188 degrees of freedom
## Multiple R-squared:  0.6213, Adjusted R-squared:   0.62 
## F-statistic: 497.3 on 60 and 18188 DF,  p-value: < 2.2e-16

Hmm, model4b with type, region, quarter and year wins here. And it has improved our model performance from 0.5874 (with three predictors) to 0.6213. That’s quite good.


2.6 Fifth variable


We are likely now pursuing variables with rather limited explanatory power, but let’s check for one more main effect, and see how much predictive power it gives us.


avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model4b) %>%
  select(-c("average_price", "type", "region", "quarter", "year"))

ggpairs(avocados_remaining_resid) + 
   theme_grey(base_size = 8) # font size of labels


It looks like x_large_bags is the remaining contender, let’s check it out!


model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
autoplot(model5)

summary(model5)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03610 -0.14545 -0.00439  0.14420  1.43907 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.167e+00  1.429e-02  81.687  < 2e-16 ***
## typeorganic                4.982e-01  3.755e-03 132.674  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.909e-02 -11.698  < 2e-16 ***
## regionBaltimoreWashington -2.698e-02  1.909e-02  -1.413 0.157614    
## regionBoise               -2.129e-01  1.909e-02 -11.151  < 2e-16 ***
## regionBoston              -3.019e-02  1.909e-02  -1.582 0.113769    
## regionBuffaloRochester    -4.424e-02  1.909e-02  -2.318 0.020485 *  
## regionCalifornia          -1.713e-01  1.919e-02  -8.925  < 2e-16 ***
## regionCharlotte            4.497e-02  1.909e-02   2.356 0.018493 *  
## regionChicago             -4.616e-03  1.909e-02  -0.242 0.808941    
## regionCincinnatiDayton    -3.521e-01  1.909e-02 -18.442  < 2e-16 ***
## regionColumbus            -3.084e-01  1.909e-02 -16.157  < 2e-16 ***
## regionDallasFtWorth       -4.759e-01  1.909e-02 -24.926  < 2e-16 ***
## regionDenver              -3.425e-01  1.909e-02 -17.940  < 2e-16 ***
## regionDetroit             -2.866e-01  1.910e-02 -15.008  < 2e-16 ***
## regionGrandRapids         -5.688e-02  1.909e-02  -2.979 0.002894 ** 
## regionGreatLakes          -2.292e-01  1.923e-02 -11.918  < 2e-16 ***
## regionHarrisburgScranton  -4.787e-02  1.909e-02  -2.508 0.012166 *  
## regionHartfordSpringfield  2.576e-01  1.909e-02  13.492  < 2e-16 ***
## regionHouston             -5.134e-01  1.909e-02 -26.894  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.909e-02 -12.954  < 2e-16 ***
## regionJacksonville        -5.015e-02  1.909e-02  -2.627 0.008615 ** 
## regionLasVegas            -1.801e-01  1.909e-02  -9.434  < 2e-16 ***
## regionLosAngeles          -3.493e-01  1.915e-02 -18.243  < 2e-16 ***
## regionLouisville          -2.744e-01  1.909e-02 -14.375  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.909e-02  -6.958 3.58e-12 ***
## regionMidsouth            -1.577e-01  1.910e-02  -8.257  < 2e-16 ***
## regionNashville           -3.490e-01  1.909e-02 -18.282  < 2e-16 ***
## regionNewOrleansMobile    -2.567e-01  1.909e-02 -13.448  < 2e-16 ***
## regionNewYork              1.662e-01  1.909e-02   8.706  < 2e-16 ***
## regionNortheast            3.955e-02  1.910e-02   2.071 0.038381 *  
## regionNorthernNewEngland  -8.371e-02  1.909e-02  -4.385 1.17e-05 ***
## regionOrlando             -5.503e-02  1.909e-02  -2.883 0.003945 ** 
## regionPhiladelphia         7.103e-02  1.909e-02   3.721 0.000199 ***
## regionPhoenixTucson       -3.367e-01  1.909e-02 -17.638  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.909e-02 -10.305  < 2e-16 ***
## regionPlains              -1.257e-01  1.909e-02  -6.581 4.80e-11 ***
## regionPortland            -2.434e-01  1.909e-02 -12.748  < 2e-16 ***
## regionRaleighGreensboro   -5.972e-03  1.909e-02  -0.313 0.754415    
## regionRichmondNorfolk     -2.698e-01  1.909e-02 -14.132  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.909e-02 -16.404  < 2e-16 ***
## regionSacramento           6.036e-02  1.909e-02   3.162 0.001571 ** 
## regionSanDiego            -1.630e-01  1.909e-02  -8.537  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.909e-02  12.728  < 2e-16 ***
## regionSeattle             -1.185e-01  1.909e-02  -6.207 5.52e-10 ***
## regionSouthCarolina       -1.579e-01  1.909e-02  -8.274  < 2e-16 ***
## regionSouthCentral        -4.625e-01  1.911e-02 -24.199  < 2e-16 ***
## regionSoutheast           -1.656e-01  1.911e-02  -8.667  < 2e-16 ***
## regionSpokane             -1.154e-01  1.909e-02  -6.045 1.52e-09 ***
## regionStLouis             -1.306e-01  1.909e-02  -6.842 8.08e-12 ***
## regionSyracuse            -4.071e-02  1.909e-02  -2.132 0.032984 *  
## regionTampa               -1.524e-01  1.909e-02  -7.983 1.52e-15 ***
## regionTotalUS             -2.647e-01  2.066e-02 -12.815  < 2e-16 ***
## regionWest                -2.897e-01  1.909e-02 -15.171  < 2e-16 ***
## regionWestTexNewMexico    -2.969e-01  1.913e-02 -15.518  < 2e-16 ***
## quarter2                   8.058e-02  5.412e-03  14.891  < 2e-16 ***
## quarter3                   2.181e-01  5.414e-03  40.293  < 2e-16 ***
## quarter4                   1.621e-01  5.375e-03  30.154  < 2e-16 ***
## year2016                  -3.791e-02  4.695e-03  -8.075 7.16e-16 ***
## year2017                   1.375e-01  4.680e-03  29.381  < 2e-16 ***
## year2018                   8.547e-02  8.360e-03  10.223  < 2e-16 ***
## x_large_bags               3.583e-07  1.246e-07   2.877 0.004025 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2482 on 18187 degrees of freedom
## Multiple R-squared:  0.6214, Adjusted R-squared:  0.6202 
## F-statistic: 489.4 on 61 and 18187 DF,  p-value: < 2.2e-16

Overall, we still have some heterscedasticity and deviations from normality in the residuals. In terms of our regression summary, it is a significant explanatory variable, and it is significant. But hmmm… with four predictors, our overall R^2 was 0.6213, and now with five we’ve only reached 0.6214. Given that there is no real increase in explanatory performance, even though it’s significant, we might want to remove it. Let’s do this now.

It’s also clear we aren’t gaining anything by adding predictors. The final thing we can do is test for interactions.


2.7 Pair interaction

Let’s now think about possible pair interactions: for four main effect variables (type + region + quarter + year), so we have six possible pair interactions. Let’s test them out.

  • type:region
  • type:quarter
  • type:year
  • region:quarter
  • region:year
  • quarter:year

Let’s test these now:


model5pa <- lm(average_price ~ type + region + quarter + year + type:region, data = trimmed_avocados)
summary(model5pa)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     type:region, data = trimmed_avocados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0082 -0.1335 -0.0024  0.1335  1.4799 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            1.202843   0.018542  64.870  < 2e-16 ***
## typeorganic                            0.424556   0.025580  16.597  < 2e-16 ***
## regionAtlanta                         -0.279941   0.025580 -10.944  < 2e-16 ***
## regionBaltimoreWashington             -0.004556   0.025580  -0.178 0.858635    
## regionBoise                           -0.272722   0.025580 -10.661  < 2e-16 ***
## regionBoston                          -0.044379   0.025580  -1.735 0.082778 .  
## regionBuffaloRochester                 0.033550   0.025580   1.312 0.189681    
## regionCalifornia                      -0.243314   0.025580  -9.512  < 2e-16 ***
## regionCharlotte                       -0.073669   0.025580  -2.880 0.003983 ** 
## regionChicago                          0.020592   0.025580   0.805 0.420838    
## regionCincinnatiDayton                -0.333254   0.025580 -13.028  < 2e-16 ***
## regionColumbus                        -0.282485   0.025580 -11.043  < 2e-16 ***
## regionDallasFtWorth                   -0.502308   0.025580 -19.637  < 2e-16 ***
## regionDenver                          -0.274793   0.025580 -10.742  < 2e-16 ***
## regionDetroit                         -0.224793   0.025580  -8.788  < 2e-16 ***
## regionGrandRapids                     -0.023728   0.025580  -0.928 0.353635    
## regionGreatLakes                      -0.166864   0.025580  -6.523 7.07e-11 ***
## regionHarrisburgScranton              -0.089941   0.025580  -3.516 0.000439 ***
## regionHartfordSpringfield              0.059290   0.025580   2.318 0.020471 *  
## regionHouston                         -0.523669   0.025580 -20.472  < 2e-16 ***
## regionIndianapolis                    -0.203905   0.025580  -7.971 1.66e-15 ***
## regionJacksonville                    -0.155148   0.025580  -6.065 1.34e-09 ***
## regionLasVegas                        -0.335799   0.025580 -13.127  < 2e-16 ***
## regionLosAngeles                      -0.372308   0.025580 -14.555  < 2e-16 ***
## regionLouisville                      -0.243432   0.025580  -9.516  < 2e-16 ***
## regionMiamiFtLauderdale               -0.094438   0.025580  -3.692 0.000223 ***
## regionMidsouth                        -0.141598   0.025580  -5.535 3.15e-08 ***
## regionNashville                       -0.335858   0.025580 -13.130  < 2e-16 ***
## regionNewOrleansMobile                -0.263491   0.025580 -10.301  < 2e-16 ***
## regionNewYork                          0.053373   0.025580   2.086 0.036948 *  
## regionNortheast                       -0.004320   0.025580  -0.169 0.865907    
## regionNorthernNewEngland              -0.088521   0.025580  -3.461 0.000540 ***
## regionOrlando                         -0.134320   0.025580  -5.251 1.53e-07 ***
## regionPhiladelphia                     0.047574   0.025580   1.860 0.062930 .  
## regionPhoenixTucson                   -0.620533   0.025580 -24.258  < 2e-16 ***
## regionPittsburgh                      -0.098107   0.025580  -3.835 0.000126 ***
## regionPlains                          -0.183254   0.025580  -7.164 8.14e-13 ***
## regionPortland                        -0.302249   0.025580 -11.816  < 2e-16 ***
## regionRaleighGreensboro               -0.121657   0.025580  -4.756 1.99e-06 ***
## regionRichmondNorfolk                 -0.228935   0.025580  -8.950  < 2e-16 ***
## regionRoanoke                         -0.252722   0.025580  -9.880  < 2e-16 ***
## regionSacramento                      -0.074793   0.025580  -2.924 0.003461 ** 
## regionSanDiego                        -0.287278   0.025580 -11.230  < 2e-16 ***
## regionSanFrancisco                     0.048402   0.025580   1.892 0.058483 .  
## regionSeattle                         -0.178994   0.025580  -6.997 2.70e-12 ***
## regionSouthCarolina                   -0.202544   0.025580  -7.918 2.55e-15 ***
## regionSouthCentral                    -0.479349   0.025580 -18.739  < 2e-16 ***
## regionSoutheast                       -0.185740   0.025580  -7.261 4.00e-13 ***
## regionSpokane                         -0.232781   0.025580  -9.100  < 2e-16 ***
## regionStLouis                         -0.163018   0.025580  -6.373 1.90e-10 ***
## regionSyracuse                         0.038166   0.025580   1.492 0.135716    
## regionTampa                           -0.147160   0.025580  -5.753 8.91e-09 ***
## regionTotalUS                         -0.256746   0.025580 -10.037  < 2e-16 ***
## regionWest                            -0.363669   0.025580 -14.217  < 2e-16 ***
## regionWestTexNewMexico                -0.506627   0.025580 -19.805  < 2e-16 ***
## quarter2                               0.081206   0.005125  15.846  < 2e-16 ***
## quarter3                               0.218901   0.005124  42.721  < 2e-16 ***
## quarter4                               0.162013   0.005092  31.814  < 2e-16 ***
## year2016                              -0.037010   0.004438  -8.340  < 2e-16 ***
## year2017                               0.138688   0.004417  31.396  < 2e-16 ***
## year2018                               0.087411   0.007895  11.071  < 2e-16 ***
## typeorganic:regionAtlanta              0.113728   0.036176   3.144 0.001671 ** 
## typeorganic:regionBaltimoreWashington -0.044497   0.036176  -1.230 0.218705    
## typeorganic:regionBoise                0.119645   0.036176   3.307 0.000944 ***
## typeorganic:regionBoston               0.028462   0.036176   0.787 0.431435    
## typeorganic:regionBuffaloRochester    -0.155503   0.036176  -4.299 1.73e-05 ***
## typeorganic:regionCalifornia           0.155207   0.036176   4.290 1.79e-05 ***
## typeorganic:regionCharlotte            0.237337   0.036176   6.561 5.50e-11 ***
## typeorganic:regionChicago             -0.049704   0.036176  -1.374 0.169471    
## typeorganic:regionCincinnatiDayton    -0.037160   0.036176  -1.027 0.304341    
## typeorganic:regionColumbus            -0.051538   0.036176  -1.425 0.154271    
## typeorganic:regionDallasFtWorth        0.053728   0.036176   1.485 0.137512    
## typeorganic:regionDenver              -0.135325   0.036176  -3.741 0.000184 ***
## typeorganic:regionDetroit             -0.120296   0.036176  -3.325 0.000885 ***
## typeorganic:regionGrandRapids         -0.064615   0.036176  -1.786 0.074092 .  
## typeorganic:regionGreatLakes          -0.111243   0.036176  -3.075 0.002108 ** 
## typeorganic:regionHarrisburgScranton   0.084379   0.036176   2.332 0.019687 *  
## typeorganic:regionHartfordSpringfield  0.396627   0.036176  10.964  < 2e-16 ***
## typeorganic:regionHouston              0.021124   0.036176   0.584 0.559273    
## typeorganic:regionIndianapolis        -0.086272   0.036176  -2.385 0.017099 *  
## typeorganic:regionJacksonville         0.210118   0.036176   5.808 6.42e-09 ***
## typeorganic:regionLasVegas             0.311361   0.036176   8.607  < 2e-16 ***
## typeorganic:regionLosAngeles           0.054556   0.036176   1.508 0.131550    
## typeorganic:regionLouisville          -0.061834   0.036176  -1.709 0.087418 .  
## typeorganic:regionMiamiFtLauderdale   -0.076213   0.036176  -2.107 0.035154 *  
## typeorganic:regionMidsouth            -0.029349   0.036176  -0.811 0.417210    
## typeorganic:regionNashville           -0.026154   0.036176  -0.723 0.469711    
## typeorganic:regionNewOrleansMobile     0.014497   0.036176   0.401 0.688618    
## typeorganic:regionNewYork              0.226331   0.036176   6.256 4.03e-10 ***
## typeorganic:regionNortheast            0.090414   0.036176   2.499 0.012453 *  
## typeorganic:regionNorthernNewEngland   0.009763   0.036176   0.270 0.787252    
## typeorganic:regionOrlando              0.158994   0.036176   4.395 1.11e-05 ***
## typeorganic:regionPhiladelphia         0.047041   0.036176   1.300 0.193496    
## typeorganic:regionPhoenixTucson        0.567870   0.036176  15.697  < 2e-16 ***
## typeorganic:regionPittsburgh          -0.197219   0.036176  -5.452 5.05e-08 ***
## typeorganic:regionPlains               0.117456   0.036176   3.247 0.001169 ** 
## typeorganic:regionPortland             0.117870   0.036176   3.258 0.001123 ** 
## typeorganic:regionRaleighGreensboro    0.231479   0.036176   6.399 1.61e-10 ***
## typeorganic:regionRichmondNorfolk     -0.081538   0.036176  -2.254 0.024211 *  
## typeorganic:regionRoanoke             -0.120769   0.036176  -3.338 0.000844 ***
## typeorganic:regionSacramento           0.270651   0.036176   7.482 7.68e-14 ***
## typeorganic:regionSanDiego             0.248817   0.036176   6.878 6.27e-12 ***
## typeorganic:regionSanFrancisco         0.389527   0.036176  10.768  < 2e-16 ***
## typeorganic:regionSeattle              0.121065   0.036176   3.347 0.000820 ***
## typeorganic:regionSouthCarolina        0.089586   0.036176   2.476 0.013281 *  
## typeorganic:regionSouthCentral         0.039112   0.036176   1.081 0.279633    
## typeorganic:regionSoutheast            0.045444   0.036176   1.256 0.209063    
## typeorganic:regionSpokane              0.234675   0.036176   6.487 8.98e-11 ***
## typeorganic:regionStLouis              0.065207   0.036176   1.803 0.071483 .  
## typeorganic:regionSyracuse            -0.157751   0.036176  -4.361 1.30e-05 ***
## typeorganic:regionTampa               -0.010059   0.036176  -0.278 0.780967    
## typeorganic:regionTotalUS              0.029467   0.036176   0.815 0.415334    
## typeorganic:regionWest                 0.149704   0.036176   4.138 3.52e-05 ***
## typeorganic:regionWestTexNewMexico     0.423157   0.036257  11.671  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2351 on 18135 degrees of freedom
## Multiple R-squared:  0.6611, Adjusted R-squared:  0.659 
## F-statistic: 313.1 on 113 and 18135 DF,  p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + type:quarter, data = trimmed_avocados)
summary(model5pb)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     type:quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.02358 -0.14643 -0.00311  0.14370  1.44227 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.180432   0.014545  81.158  < 2e-16 ***
## typeorganic                0.469434   0.006682  70.256  < 2e-16 ***
## regionAtlanta             -0.223077   0.019073 -11.696  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.019073  -1.405 0.159924    
## regionBoise               -0.212899   0.019073 -11.162  < 2e-16 ***
## regionBoston              -0.030148   0.019073  -1.581 0.113971    
## regionBuffaloRochester    -0.044201   0.019073  -2.317 0.020488 *  
## regionCalifornia          -0.165710   0.019073  -8.688  < 2e-16 ***
## regionCharlotte            0.045000   0.019073   2.359 0.018316 *  
## regionChicago             -0.004260   0.019073  -0.223 0.823248    
## regionCincinnatiDayton    -0.351834   0.019073 -18.447  < 2e-16 ***
## regionColumbus            -0.308254   0.019073 -16.162  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.019073 -24.928  < 2e-16 ***
## regionDenver              -0.342456   0.019073 -17.955  < 2e-16 ***
## regionDetroit             -0.284941   0.019073 -14.940  < 2e-16 ***
## regionGrandRapids         -0.056036   0.019073  -2.938 0.003308 ** 
## regionGreatLakes          -0.222485   0.019073 -11.665  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.019073  -2.504 0.012301 *  
## regionHartfordSpringfield  0.257604   0.019073  13.506  < 2e-16 ***
## regionHouston             -0.513107   0.019073 -26.902  < 2e-16 ***
## regionIndianapolis        -0.247041   0.019073 -12.953  < 2e-16 ***
## regionJacksonville        -0.050089   0.019073  -2.626 0.008642 ** 
## regionLasVegas            -0.180118   0.019073  -9.444  < 2e-16 ***
## regionLosAngeles          -0.345030   0.019073 -18.090  < 2e-16 ***
## regionLouisville          -0.274349   0.019073 -14.384  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.019073  -6.949 3.79e-12 ***
## regionMidsouth            -0.156272   0.019073  -8.193 2.71e-16 ***
## regionNashville           -0.348935   0.019073 -18.295  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.019073 -13.435  < 2e-16 ***
## regionNewYork              0.166538   0.019073   8.732  < 2e-16 ***
## regionNortheast            0.040888   0.019073   2.144 0.032066 *  
## regionNorthernNewEngland  -0.083639   0.019073  -4.385 1.17e-05 ***
## regionOrlando             -0.054822   0.019073  -2.874 0.004053 ** 
## regionPhiladelphia         0.071095   0.019073   3.728 0.000194 ***
## regionPhoenixTucson       -0.336598   0.019073 -17.648  < 2e-16 ***
## regionPittsburgh          -0.196716   0.019073 -10.314  < 2e-16 ***
## regionPlains              -0.124527   0.019073  -6.529 6.80e-11 ***
## regionPortland            -0.243314   0.019073 -12.757  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.019073  -0.310 0.756382    
## regionRichmondNorfolk     -0.269704   0.019073 -14.141  < 2e-16 ***
## regionRoanoke             -0.313107   0.019073 -16.416  < 2e-16 ***
## regionSacramento           0.060533   0.019073   3.174 0.001507 ** 
## regionSanDiego            -0.162870   0.019073  -8.539  < 2e-16 ***
## regionSanFrancisco         0.243166   0.019073  12.749  < 2e-16 ***
## regionSeattle             -0.118462   0.019073  -6.211 5.38e-10 ***
## regionSouthCarolina       -0.157751   0.019073  -8.271  < 2e-16 ***
## regionSouthCentral        -0.459793   0.019073 -24.107  < 2e-16 ***
## regionSoutheast           -0.163018   0.019073  -8.547  < 2e-16 ***
## regionSpokane             -0.115444   0.019073  -6.053 1.45e-09 ***
## regionStLouis             -0.130414   0.019073  -6.838 8.30e-12 ***
## regionSyracuse            -0.040710   0.019073  -2.134 0.032819 *  
## regionTampa               -0.152189   0.019073  -7.979 1.56e-15 ***
## regionTotalUS             -0.242012   0.019073 -12.689  < 2e-16 ***
## regionWest                -0.288817   0.019073 -15.143  < 2e-16 ***
## regionWestTexNewMexico    -0.296626   0.019116 -15.518  < 2e-16 ***
## quarter2                   0.066217   0.007413   8.933  < 2e-16 ***
## quarter3                   0.186137   0.007413  25.110  < 2e-16 ***
## quarter4                   0.152474   0.007364  20.706  < 2e-16 ***
## year2016                  -0.036977   0.004679  -7.902 2.89e-15 ***
## year2017                   0.138659   0.004658  29.768  < 2e-16 ***
## year2018                   0.087412   0.008325  10.500  < 2e-16 ***
## typeorganic:quarter2       0.029809   0.010152   2.936 0.003325 ** 
## typeorganic:quarter3       0.065528   0.010150   6.456 1.10e-10 ***
## typeorganic:quarter4       0.018995   0.010079   1.885 0.059501 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2479 on 18185 degrees of freedom
## Multiple R-squared:  0.6222, Adjusted R-squared:  0.6209 
## F-statistic: 475.3 on 63 and 18185 DF,  p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + type:year, data = trimmed_avocados)
summary(model5pc)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     type:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00911 -0.14461 -0.00436  0.13900  1.46703 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.117496   0.014421  77.493  < 2e-16 ***
## typeorganic                0.595327   0.006565  90.688  < 2e-16 ***
## regionAtlanta             -0.223077   0.018919 -11.791  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.018919  -1.417 0.156565    
## regionBoise               -0.212899   0.018919 -11.253  < 2e-16 ***
## regionBoston              -0.030148   0.018919  -1.593 0.111069    
## regionBuffaloRochester    -0.044201   0.018919  -2.336 0.019488 *  
## regionCalifornia          -0.165710   0.018919  -8.759  < 2e-16 ***
## regionCharlotte            0.045000   0.018919   2.379 0.017393 *  
## regionChicago             -0.004260   0.018919  -0.225 0.821839    
## regionCincinnatiDayton    -0.351834   0.018919 -18.596  < 2e-16 ***
## regionColumbus            -0.308254   0.018919 -16.293  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.018919 -25.130  < 2e-16 ***
## regionDenver              -0.342456   0.018919 -18.101  < 2e-16 ***
## regionDetroit             -0.284941   0.018919 -15.061  < 2e-16 ***
## regionGrandRapids         -0.056036   0.018919  -2.962 0.003063 ** 
## regionGreatLakes          -0.222485   0.018919 -11.760  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.018919  -2.524 0.011613 *  
## regionHartfordSpringfield  0.257604   0.018919  13.616  < 2e-16 ***
## regionHouston             -0.513107   0.018919 -27.121  < 2e-16 ***
## regionIndianapolis        -0.247041   0.018919 -13.058  < 2e-16 ***
## regionJacksonville        -0.050089   0.018919  -2.647 0.008117 ** 
## regionLasVegas            -0.180118   0.018919  -9.520  < 2e-16 ***
## regionLosAngeles          -0.345030   0.018919 -18.237  < 2e-16 ***
## regionLouisville          -0.274349   0.018919 -14.501  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.018919  -7.006 2.54e-12 ***
## regionMidsouth            -0.156272   0.018919  -8.260  < 2e-16 ***
## regionNashville           -0.348935   0.018919 -18.443  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.018919 -13.544  < 2e-16 ***
## regionNewYork              0.166538   0.018919   8.802  < 2e-16 ***
## regionNortheast            0.040888   0.018919   2.161 0.030698 *  
## regionNorthernNewEngland  -0.083639   0.018919  -4.421 9.89e-06 ***
## regionOrlando             -0.054822   0.018919  -2.898 0.003764 ** 
## regionPhiladelphia         0.071095   0.018919   3.758 0.000172 ***
## regionPhoenixTucson       -0.336598   0.018919 -17.791  < 2e-16 ***
## regionPittsburgh          -0.196716   0.018919 -10.398  < 2e-16 ***
## regionPlains              -0.124527   0.018919  -6.582 4.77e-11 ***
## regionPortland            -0.243314   0.018919 -12.860  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.018919  -0.313 0.754471    
## regionRichmondNorfolk     -0.269704   0.018919 -14.255  < 2e-16 ***
## regionRoanoke             -0.313107   0.018919 -16.549  < 2e-16 ***
## regionSacramento           0.060533   0.018919   3.199 0.001379 ** 
## regionSanDiego            -0.162870   0.018919  -8.609  < 2e-16 ***
## regionSanFrancisco         0.243166   0.018919  12.853  < 2e-16 ***
## regionSeattle             -0.118462   0.018919  -6.261 3.90e-10 ***
## regionSouthCarolina       -0.157751   0.018919  -8.338  < 2e-16 ***
## regionSouthCentral        -0.459793   0.018919 -24.303  < 2e-16 ***
## regionSoutheast           -0.163018   0.018919  -8.616  < 2e-16 ***
## regionSpokane             -0.115444   0.018919  -6.102 1.07e-09 ***
## regionStLouis             -0.130414   0.018919  -6.893 5.64e-12 ***
## regionSyracuse            -0.040710   0.018919  -2.152 0.031430 *  
## regionTampa               -0.152189   0.018919  -8.044 9.22e-16 ***
## regionTotalUS             -0.242012   0.018919 -12.792  < 2e-16 ***
## regionWest                -0.288817   0.018919 -15.266  < 2e-16 ***
## regionWestTexNewMexico    -0.296641   0.018962 -15.644  < 2e-16 ***
## quarter2                   0.081108   0.005360  15.132  < 2e-16 ***
## quarter3                   0.218901   0.005359  40.844  < 2e-16 ***
## quarter4                   0.161984   0.005327  30.410  < 2e-16 ***
## year2016                   0.027632   0.006564   4.210 2.57e-05 ***
## year2017                   0.216048   0.006533  33.069  < 2e-16 ***
## year2018                   0.165421   0.011209  14.758  < 2e-16 ***
## typeorganic:year2016      -0.129237   0.009283 -13.921  < 2e-16 ***
## typeorganic:year2017      -0.154818   0.009240 -16.755  < 2e-16 ***
## typeorganic:year2018      -0.156037   0.015159 -10.293  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.246 on 18185 degrees of freedom
## Multiple R-squared:  0.6282, Adjusted R-squared:  0.6269 
## F-statistic: 487.7 on 63 and 18185 DF,  p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + region:quarter, data = trimmed_avocados)
summary(model5pd)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     region:quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06598 -0.14588  0.00059  0.14115  1.38051 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.216463   0.024241  50.182  < 2e-16 ***
## typeorganic                         0.495917   0.003583 138.408  < 2e-16 ***
## regionAtlanta                      -0.257647   0.033888  -7.603 3.04e-14 ***
## regionBaltimoreWashington          -0.089804   0.033888  -2.650 0.008056 ** 
## regionBoise                        -0.285392   0.033888  -8.422  < 2e-16 ***
## regionBoston                       -0.007059   0.033888  -0.208 0.835000    
## regionBuffaloRochester             -0.031078   0.033888  -0.917 0.359111    
## regionCalifornia                   -0.279706   0.033888  -8.254  < 2e-16 ***
## regionCharlotte                    -0.021471   0.033888  -0.634 0.526370    
## regionChicago                      -0.073627   0.033888  -2.173 0.029820 *  
## regionCincinnatiDayton             -0.434902   0.033888 -12.833  < 2e-16 ***
## regionColumbus                     -0.324804   0.033888  -9.585  < 2e-16 ***
## regionDallasFtWorth                -0.484510   0.033888 -14.297  < 2e-16 ***
## regionDenver                       -0.421569   0.033888 -12.440  < 2e-16 ***
## regionDetroit                      -0.305000   0.033888  -9.000  < 2e-16 ***
## regionGrandRapids                  -0.128235   0.033888  -3.784 0.000155 ***
## regionGreatLakes                   -0.268137   0.033888  -7.912 2.67e-15 ***
## regionHarrisburgScranton           -0.060000   0.033888  -1.771 0.076657 .  
## regionHartfordSpringfield           0.229020   0.033888   6.758 1.44e-11 ***
## regionHouston                      -0.537059   0.033888 -15.848  < 2e-16 ***
## regionIndianapolis                 -0.273824   0.033888  -8.080 6.87e-16 ***
## regionJacksonville                 -0.110392   0.033888  -3.258 0.001126 ** 
## regionLasVegas                     -0.290686   0.033888  -8.578  < 2e-16 ***
## regionLosAngeles                   -0.433039   0.033888 -12.778  < 2e-16 ***
## regionLouisville                   -0.295490   0.033888  -8.720  < 2e-16 ***
## regionMiamiFtLauderdale            -0.111863   0.033888  -3.301 0.000966 ***
## regionMidsouth                     -0.194510   0.033888  -5.740 9.64e-09 ***
## regionNashville                    -0.351275   0.033888 -10.366  < 2e-16 ***
## regionNewOrleansMobile             -0.317255   0.033888  -9.362  < 2e-16 ***
## regionNewYork                       0.105098   0.033888   3.101 0.001930 ** 
## regionNortheast                     0.020000   0.033888   0.590 0.555082    
## regionNorthernNewEngland           -0.059804   0.033888  -1.765 0.077625 .  
## regionOrlando                      -0.103431   0.033888  -3.052 0.002276 ** 
## regionPhiladelphia                  0.016569   0.033888   0.489 0.624905    
## regionPhoenixTucson                -0.445294   0.033888 -13.140  < 2e-16 ***
## regionPittsburgh                   -0.174510   0.033888  -5.150 2.64e-07 ***
## regionPlains                       -0.184412   0.033888  -5.442 5.34e-08 ***
## regionPortland                     -0.353235   0.033888 -10.424  < 2e-16 ***
## regionRaleighGreensboro            -0.058039   0.033888  -1.713 0.086792 .  
## regionRichmondNorfolk              -0.263627   0.033888  -7.779 7.68e-15 ***
## regionRoanoke                      -0.312255   0.033888  -9.214  < 2e-16 ***
## regionSacramento                   -0.027059   0.033888  -0.798 0.424608    
## regionSanDiego                     -0.286667   0.033888  -8.459  < 2e-16 ***
## regionSanFrancisco                  0.090588   0.033888   2.673 0.007521 ** 
## regionSeattle                      -0.258824   0.033888  -7.638 2.32e-14 ***
## regionSouthCarolina                -0.206961   0.033888  -6.107 1.04e-09 ***
## regionSouthCentral                 -0.475686   0.033888 -14.037  < 2e-16 ***
## regionSoutheast                    -0.207255   0.033888  -6.116 9.80e-10 ***
## regionSpokane                      -0.269608   0.033888  -7.956 1.88e-15 ***
## regionStLouis                      -0.190980   0.033888  -5.636 1.77e-08 ***
## regionSyracuse                     -0.027647   0.033888  -0.816 0.414609    
## regionTampa                        -0.153235   0.033888  -4.522 6.17e-06 ***
## regionTotalUS                      -0.290392   0.033888  -8.569  < 2e-16 ***
## regionWest                         -0.389020   0.033888 -11.479  < 2e-16 ***
## regionWestTexNewMexico             -0.365980   0.033888 -10.800  < 2e-16 ***
## quarter2                            0.085685   0.036447   2.351 0.018736 *  
## quarter3                            0.093249   0.036447   2.558 0.010521 *  
## quarter4                            0.071967   0.036188   1.989 0.046752 *  
## year2016                           -0.036996   0.004567  -8.100 5.83e-16 ***
## year2017                            0.138600   0.004546  30.485  < 2e-16 ***
## year2018                            0.087387   0.008126  10.754  < 2e-16 ***
## regionAtlanta:quarter2             -0.088379   0.051480  -1.717 0.086041 .  
## regionBaltimoreWashington:quarter2  0.092368   0.051480   1.794 0.072790 .  
## regionBoise:quarter2               -0.095505   0.051480  -1.855 0.063585 .  
## regionBoston:quarter2               0.011418   0.051480   0.222 0.824479    
## regionBuffaloRochester:quarter2     0.081719   0.051480   1.587 0.112440    
## regionCalifornia:quarter2           0.003552   0.051480   0.069 0.944992    
## regionCharlotte:quarter2            0.062240   0.051480   1.209 0.226676    
## regionChicago:quarter2             -0.004193   0.051480  -0.081 0.935085    
## regionCincinnatiDayton:quarter2     0.010030   0.051480   0.195 0.845524    
## regionColumbus:quarter2            -0.094042   0.051480  -1.827 0.067751 .  
## regionDallasFtWorth:quarter2       -0.078439   0.051480  -1.524 0.127607    
## regionDenver:quarter2              -0.015739   0.051480  -0.306 0.759813    
## regionDetroit:quarter2             -0.036923   0.051480  -0.717 0.473241    
## regionGrandRapids:quarter2          0.135799   0.051480   2.638 0.008349 ** 
## regionGreatLakes:quarter2          -0.011478   0.051480  -0.223 0.823567    
## regionHarrisburgScranton:quarter2   0.065513   0.051480   1.273 0.203181    
## regionHartfordSpringfield:quarter2  0.067262   0.051480   1.307 0.191375    
## regionHouston:quarter2             -0.089223   0.051480  -1.733 0.083084 .  
## regionIndianapolis:quarter2        -0.064253   0.051480  -1.248 0.212003    
## regionJacksonville:quarter2         0.028213   0.051480   0.548 0.583677    
## regionLasVegas:quarter2            -0.074314   0.051480  -1.444 0.148885    
## regionLosAngeles:quarter2          -0.060679   0.051480  -1.179 0.238540    
## regionLouisville:quarter2          -0.074510   0.051480  -1.447 0.147816    
## regionMiamiFtLauderdale:quarter2   -0.009676   0.051480  -0.188 0.850917    
## regionMidsouth:quarter2            -0.013952   0.051480  -0.271 0.786385    
## regionNashville:quarter2           -0.102572   0.051480  -1.992 0.046336 *  
## regionNewOrleansMobile:quarter2     0.083793   0.051480   1.628 0.103609    
## regionNewYork:quarter2              0.087722   0.051480   1.704 0.088397 .  
## regionNortheast:quarter2            0.056410   0.051480   1.096 0.273195    
## regionNorthernNewEngland:quarter2  -0.067632   0.051480  -1.314 0.188947    
## regionOrlando:quarter2              0.018047   0.051480   0.351 0.725924    
## regionPhiladelphia:quarter2         0.109970   0.051480   2.136 0.032680 *  
## regionPhoenixTucson:quarter2       -0.020090   0.051480  -0.390 0.696351    
## regionPittsburgh:quarter2          -0.038054   0.051480  -0.739 0.459792    
## regionPlains:quarter2              -0.002896   0.051480  -0.056 0.955141    
## regionPortland:quarter2            -0.045354   0.051480  -0.881 0.378324    
## regionRaleighGreensboro:quarter2    0.001885   0.051480   0.037 0.970786    
## regionRichmondNorfolk:quarter2     -0.113552   0.051480  -2.206 0.027414 *  
## regionRoanoke:quarter2             -0.131207   0.051480  -2.549 0.010821 *  
## regionSacramento:quarter2           0.084238   0.051480   1.636 0.101788    
## regionSanDiego:quarter2            -0.003333   0.051480  -0.065 0.948374    
## regionSanFrancisco:quarter2         0.121976   0.051480   2.369 0.017828 *  
## regionSeattle:quarter2              0.012029   0.051480   0.234 0.815254    
## regionSouthCarolina:quarter2        0.027602   0.051480   0.536 0.591851    
## regionSouthCentral:quarter2        -0.072262   0.051480  -1.404 0.160426    
## regionSoutheast:quarter2           -0.005950   0.051480  -0.116 0.907984    
## regionSpokane:quarter2              0.009736   0.051480   0.189 0.849999    
## regionStLouis:quarter2              0.057006   0.051480   1.107 0.268161    
## regionSyracuse:quarter2             0.064955   0.051480   1.262 0.207057    
## regionTampa:quarter2                0.006056   0.051480   0.118 0.906359    
## regionTotalUS:quarter2             -0.009223   0.051480  -0.179 0.857813    
## regionWest:quarter2                -0.029186   0.051480  -0.567 0.570770    
## regionWestTexNewMexico:quarter2    -0.096213   0.051672  -1.862 0.062620 .  
## regionAtlanta:quarter3              0.122391   0.051480   2.377 0.017444 *  
## regionBaltimoreWashington:quarter3  0.095830   0.051480   1.861 0.062691 .  
## regionBoise:quarter3                0.251931   0.051480   4.894 9.98e-07 ***
## regionBoston:quarter3              -0.001146   0.051480  -0.022 0.982235    
## regionBuffaloRochester:quarter3    -0.034050   0.051480  -0.661 0.508354    
## regionCalifornia:quarter3           0.255860   0.051480   4.970 6.75e-07 ***
## regionCharlotte:quarter3            0.139804   0.051480   2.716 0.006620 ** 
## regionChicago:quarter3              0.174012   0.051480   3.380 0.000726 ***
## regionCincinnatiDayton:quarter3     0.212594   0.051480   4.130 3.65e-05 ***
## regionColumbus:quarter3             0.109291   0.051480   2.123 0.033769 *  
## regionDallasFtWorth:quarter3        0.023228   0.051480   0.451 0.651852    
## regionDenver:quarter3               0.212466   0.051480   4.127 3.69e-05 ***
## regionDetroit:quarter3              0.054872   0.051480   1.066 0.286490    
## regionGrandRapids:quarter3          0.091440   0.051480   1.776 0.075712 .  
## regionGreatLakes:quarter3           0.123522   0.051480   2.399 0.016432 *  
## regionHarrisburgScranton:quarter3   0.006795   0.051480   0.132 0.894993    
## regionHartfordSpringfield:quarter3  0.049442   0.051480   0.960 0.336862    
## regionHouston:quarter3              0.072059   0.051480   1.400 0.161608    
## regionIndianapolis:quarter3         0.092157   0.051480   1.790 0.073447 .  
## regionJacksonville:quarter3         0.168213   0.051480   3.268 0.001087 ** 
## regionLasVegas:quarter3             0.295302   0.051480   5.736 9.84e-09 ***
## regionLosAngeles:quarter3           0.214578   0.051480   4.168 3.08e-05 ***
## regionLouisville:quarter3           0.084721   0.051480   1.646 0.099842 .  
## regionMiamiFtLauderdale:quarter3   -0.072240   0.051480  -1.403 0.160557    
## regionMidsouth:quarter3             0.095407   0.051480   1.853 0.063858 .  
## regionNashville:quarter3            0.041531   0.051480   0.807 0.419828    
## regionNewOrleansMobile:quarter3     0.071357   0.051480   1.386 0.165728    
## regionNewYork:quarter3              0.112338   0.051480   2.182 0.029110 *  
## regionNortheast:quarter3            0.050256   0.051480   0.976 0.328963    
## regionNorthernNewEngland:quarter3  -0.013658   0.051480  -0.265 0.790782    
## regionOrlando:quarter3              0.116252   0.051480   2.258 0.023946 *  
## regionPhiladelphia:quarter3         0.082149   0.051480   1.596 0.110562    
## regionPhoenixTucson:quarter3        0.260038   0.051480   5.051 4.43e-07 ***
## regionPittsburgh:quarter3          -0.016131   0.051480  -0.313 0.754019    
## regionPlains:quarter3               0.136335   0.051480   2.648 0.008097 ** 
## regionPortland:quarter3             0.334261   0.051480   6.493 8.63e-11 ***
## regionRaleighGreensboro:quarter3    0.121373   0.051480   2.358 0.018401 *  
## regionRichmondNorfolk:quarter3      0.051576   0.051480   1.002 0.316421    
## regionRoanoke:quarter3              0.090460   0.051480   1.757 0.078903 .  
## regionSacramento:quarter3           0.181161   0.051480   3.519 0.000434 ***
## regionSanDiego:quarter3             0.280385   0.051480   5.446 5.21e-08 ***
## regionSanFrancisco:quarter3         0.312360   0.051480   6.068 1.32e-09 ***
## regionSeattle:quarter3              0.392029   0.051480   7.615 2.76e-14 ***
## regionSouthCarolina:quarter3        0.102345   0.051480   1.988 0.046820 *  
## regionSouthCentral:quarter3         0.042609   0.051480   0.828 0.407859    
## regionSoutheast:quarter3            0.111357   0.051480   2.163 0.030545 *  
## regionSpokane:quarter3              0.393582   0.051480   7.645 2.19e-14 ***
## regionStLouis:quarter3              0.192134   0.051480   3.732 0.000190 ***
## regionSyracuse:quarter3            -0.036840   0.051480  -0.716 0.474236    
## regionTampa:quarter3               -0.043047   0.051480  -0.836 0.403063    
## regionTotalUS:quarter3              0.104751   0.051480   2.035 0.041887 *  
## regionWest:quarter3                 0.297609   0.051480   5.781 7.55e-09 ***
## regionWestTexNewMexico:quarter3     0.178160   0.051480   3.461 0.000540 ***
## regionAtlanta:quarter4              0.112897   0.051114   2.209 0.027206 *  
## regionBaltimoreWashington:quarter4  0.082679   0.051114   1.618 0.105780    
## regionBoise:quarter4                0.153767   0.051114   3.008 0.002631 ** 
## regionBoston:quarter4              -0.107566   0.051114  -2.104 0.035355 *  
## regionBuffaloRochester:quarter4    -0.101922   0.051114  -1.994 0.046167 *  
## regionCalifornia:quarter4           0.228706   0.051114   4.474 7.71e-06 ***
## regionCharlotte:quarter4            0.083846   0.051114   1.640 0.100948    
## regionChicago:quarter4              0.127502   0.051114   2.494 0.012624 *  
## regionCincinnatiDayton:quarter4     0.133902   0.051114   2.620 0.008809 ** 
## regionColumbus:quarter4             0.055054   0.051114   1.077 0.281460    
## regionDallasFtWorth:quarter4        0.092135   0.051114   1.803 0.071479 .  
## regionDenver:quarter4               0.142444   0.051114   2.787 0.005329 ** 
## regionDetroit:quarter4              0.067250   0.051114   1.316 0.188297    
## regionGrandRapids:quarter4          0.083485   0.051114   1.633 0.102421    
## regionGreatLakes:quarter4           0.083637   0.051114   1.636 0.101797    
## regionHarrisburgScranton:quarter4  -0.018750   0.051114  -0.367 0.713753    
## regionHartfordSpringfield:quarter4  0.006980   0.051114   0.137 0.891376    
## regionHouston:quarter4              0.117934   0.051114   2.307 0.021051 *  
## regionIndianapolis:quarter4         0.085949   0.051114   1.682 0.092683 .  
## regionJacksonville:quarter4         0.063267   0.051114   1.238 0.215820    
## regionLasVegas:quarter4             0.251686   0.051114   4.924 8.55e-07 ***
## regionLosAngeles:quarter4           0.221789   0.051114   4.339 1.44e-05 ***
## regionLouisville:quarter4           0.079365   0.051114   1.553 0.120511    
## regionMiamiFtLauderdale:quarter4   -0.007512   0.051114  -0.147 0.883157    
## regionMidsouth:quarter4             0.082135   0.051114   1.607 0.108096    
## regionNashville:quarter4            0.069400   0.051114   1.358 0.174564    
## regionNewOrleansMobile:quarter4     0.106505   0.051114   2.084 0.037204 *  
## regionNewYork:quarter4              0.064527   0.051114   1.262 0.206818    
## regionNortheast:quarter4           -0.015750   0.051114  -0.308 0.757984    
## regionNorthernNewEngland:quarter4  -0.021446   0.051114  -0.420 0.674803    
## regionOrlando:quarter4              0.074431   0.051114   1.456 0.145360    
## regionPhiladelphia:quarter4         0.043056   0.051114   0.842 0.399599    
## regionPhoenixTucson:quarter4        0.225294   0.051114   4.408 1.05e-05 ***
## regionPittsburgh:quarter4          -0.040990   0.051114  -0.802 0.422601    
## regionPlains:quarter4               0.122912   0.051114   2.405 0.016198 *  
## regionPortland:quarter4             0.182735   0.051114   3.575 0.000351 ***
## regionRaleighGreensboro:quarter4    0.100039   0.051114   1.957 0.050342 .  
## regionRichmondNorfolk:quarter4      0.034752   0.051114   0.680 0.496577    
## regionRoanoke:quarter4              0.036130   0.051114   0.707 0.479670    
## regionSacramento:quarter4           0.111309   0.051114   2.178 0.029445 *  
## regionSanDiego:quarter4             0.252917   0.051114   4.948 7.56e-07 ***
## regionSanFrancisco:quarter4         0.221162   0.051114   4.327 1.52e-05 ***
## regionSeattle:quarter4              0.199074   0.051114   3.895 9.87e-05 ***
## regionSouthCarolina:quarter4        0.081211   0.051114   1.589 0.112120    
## regionSouthCentral:quarter4         0.096061   0.051114   1.879 0.060213 .  
## regionSoutheast:quarter4            0.084130   0.051114   1.646 0.099797 .  
## regionSpokane:quarter4              0.258108   0.051114   5.050 4.47e-07 ***
## regionStLouis:quarter4              0.012980   0.051114   0.254 0.799538    
## regionSyracuse:quarter4            -0.082603   0.051114  -1.616 0.106101    
## regionTampa:quarter4                0.040485   0.051114   0.792 0.428338    
## regionTotalUS:quarter4              0.111267   0.051114   2.177 0.029506 *  
## regionWest:quarter4                 0.161645   0.051114   3.162 0.001567 ** 
## regionWestTexNewMexico:quarter4     0.211605   0.051205   4.133 3.60e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.242 on 18029 degrees of freedom
## Multiple R-squared:  0.6431, Adjusted R-squared:  0.6388 
## F-statistic: 148.4 on 219 and 18029 DF,  p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + region:year, data = trimmed_avocados)
summary(model5pe)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     region:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03093 -0.14190 -0.00143  0.13797  1.38892 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.175e+00  2.396e-02  49.047  < 2e-16 ***
## typeorganic                         4.959e-01  3.575e-03 138.719  < 2e-16 ***
## regionAtlanta                      -1.582e-01  3.349e-02  -4.724 2.33e-06 ***
## regionBaltimoreWashington          -1.699e-01  3.349e-02  -5.074 3.94e-07 ***
## regionBoise                        -1.650e-01  3.349e-02  -4.927 8.40e-07 ***
## regionBoston                       -6.519e-02  3.349e-02  -1.947 0.051566 .  
## regionBuffaloRochester              5.865e-03  3.349e-02   0.175 0.860955    
## regionCalifornia                   -2.229e-01  3.349e-02  -6.656 2.89e-11 ***
## regionCharlotte                     3.702e-02  3.349e-02   1.106 0.268948    
## regionChicago                      -1.347e-01  3.349e-02  -4.023 5.77e-05 ***
## regionCincinnatiDayton             -3.364e-01  3.349e-02 -10.047  < 2e-16 ***
## regionColumbus                     -2.649e-01  3.349e-02  -7.911 2.70e-15 ***
## regionDallasFtWorth                -4.609e-01  3.349e-02 -13.763  < 2e-16 ***
## regionDenver                       -3.510e-01  3.349e-02 -10.481  < 2e-16 ***
## regionDetroit                      -2.005e-01  3.349e-02  -5.987 2.18e-09 ***
## regionGrandRapids                  -1.224e-01  3.349e-02  -3.655 0.000258 ***
## regionGreatLakes                   -2.125e-01  3.349e-02  -6.346 2.26e-10 ***
## regionHarrisburgScranton           -6.712e-02  3.349e-02  -2.004 0.045053 *  
## regionHartfordSpringfield           2.090e-01  3.349e-02   6.243 4.40e-10 ***
## regionHouston                      -4.907e-01  3.349e-02 -14.653  < 2e-16 ***
## regionIndianapolis                 -1.958e-01  3.349e-02  -5.846 5.11e-09 ***
## regionJacksonville                 -3.567e-02  3.349e-02  -1.065 0.286744    
## regionLasVegas                     -1.699e-01  3.349e-02  -5.074 3.94e-07 ***
## regionLosAngeles                   -3.862e-01  3.349e-02 -11.535  < 2e-16 ***
## regionLouisville                   -2.443e-01  3.349e-02  -7.296 3.08e-13 ***
## regionMiamiFtLauderdale            -1.552e-01  3.349e-02  -4.635 3.60e-06 ***
## regionMidsouth                     -1.874e-01  3.349e-02  -5.597 2.22e-08 ***
## regionNashville                    -2.615e-01  3.349e-02  -7.810 6.01e-15 ***
## regionNewOrleansMobile             -2.711e-01  3.349e-02  -8.095 6.10e-16 ***
## regionNewYork                       1.058e-01  3.349e-02   3.159 0.001588 ** 
## regionNortheast                     5.000e-03  3.349e-02   0.149 0.881305    
## regionNorthernNewEngland           -6.538e-02  3.349e-02  -1.953 0.050881 .  
## regionOrlando                      -3.942e-02  3.349e-02  -1.177 0.239087    
## regionPhiladelphia                  1.644e-02  3.349e-02   0.491 0.623415    
## regionPhoenixTucson                -3.816e-01  3.349e-02 -11.397  < 2e-16 ***
## regionPittsburgh                   -1.315e-01  3.349e-02  -3.928 8.59e-05 ***
## regionPlains                       -1.009e-01  3.349e-02  -3.012 0.002597 ** 
## regionPortland                     -2.319e-01  3.349e-02  -6.926 4.47e-12 ***
## regionRaleighGreensboro            -8.933e-02  3.349e-02  -2.668 0.007646 ** 
## regionRichmondNorfolk              -2.642e-01  3.349e-02  -7.891 3.17e-15 ***
## regionRoanoke                      -3.116e-01  3.349e-02  -9.306  < 2e-16 ***
## regionSacramento                   -8.471e-02  3.349e-02  -2.530 0.011422 *  
## regionSanDiego                     -2.645e-01  3.349e-02  -7.899 2.96e-15 ***
## regionSanFrancisco                  8.231e-02  3.349e-02   2.458 0.013981 *  
## regionSeattle                      -1.165e-01  3.349e-02  -3.480 0.000502 ***
## regionSouthCarolina                -8.404e-02  3.349e-02  -2.510 0.012093 *  
## regionSouthCentral                 -4.267e-01  3.349e-02 -12.744  < 2e-16 ***
## regionSoutheast                    -1.240e-01  3.349e-02  -3.704 0.000213 ***
## regionSpokane                      -1.384e-01  3.349e-02  -4.132 3.61e-05 ***
## regionStLouis                      -3.538e-02  3.349e-02  -1.057 0.290659    
## regionSyracuse                     -9.712e-03  3.349e-02  -0.290 0.771804    
## regionTampa                        -1.821e-01  3.349e-02  -5.439 5.44e-08 ***
## regionTotalUS                      -2.813e-01  3.349e-02  -8.402  < 2e-16 ***
## regionWest                         -3.010e-01  3.349e-02  -8.988  < 2e-16 ***
## regionWestTexNewMexico             -2.766e-01  3.357e-02  -8.239  < 2e-16 ***
## quarter2                            8.108e-02  5.262e-03  15.407  < 2e-16 ***
## quarter3                            2.189e-01  5.262e-03  41.602  < 2e-16 ***
## quarter4                            1.620e-01  5.229e-03  30.974  < 2e-16 ***
## year2016                           -4.808e-03  3.349e-02  -0.144 0.885838    
## year2017                            9.820e-02  3.333e-02   2.947 0.003217 ** 
## year2018                            1.257e-02  5.478e-02   0.230 0.818454    
## regionAtlanta:year2016             -1.616e-01  4.736e-02  -3.413 0.000643 ***
## regionBaltimoreWashington:year2016  2.236e-01  4.736e-02   4.721 2.37e-06 ***
## regionBoise:year2016               -2.270e-01  4.736e-02  -4.794 1.65e-06 ***
## regionBoston:year2016              -4.260e-02  4.736e-02  -0.899 0.368404    
## regionBuffaloRochester:year2016    -5.596e-02  4.736e-02  -1.182 0.237332    
## regionCalifornia:year2016           1.885e-02  4.736e-02   0.398 0.690658    
## regionCharlotte:year2016           -7.308e-02  4.736e-02  -1.543 0.122814    
## regionChicago:year2016              1.481e-01  4.736e-02   3.127 0.001769 ** 
## regionCincinnatiDayton:year2016    -1.091e-01  4.736e-02  -2.305 0.021203 *  
## regionColumbus:year2016            -8.269e-02  4.736e-02  -1.746 0.080796 .  
## regionDallasFtWorth:year2016       -7.692e-02  4.736e-02  -1.624 0.104317    
## regionDenver:year2016              -8.981e-02  4.736e-02  -1.896 0.057918 .  
## regionDetroit:year2016             -1.611e-01  4.736e-02  -3.401 0.000673 ***
## regionGrandRapids:year2016          9.779e-02  4.736e-02   2.065 0.038940 *  
## regionGreatLakes:year2016          -4.442e-02  4.736e-02  -0.938 0.348222    
## regionHarrisburgScranton:year2016   4.481e-02  4.736e-02   0.946 0.344065    
## regionHartfordSpringfield:year2016  1.081e-01  4.736e-02   2.282 0.022488 *  
## regionHouston:year2016             -5.135e-02  4.736e-02  -1.084 0.278264    
## regionIndianapolis:year2016        -3.663e-02  4.736e-02  -0.774 0.439177    
## regionJacksonville:year2016        -1.306e-01  4.736e-02  -2.757 0.005833 ** 
## regionLasVegas:year2016            -1.163e-02  4.736e-02  -0.246 0.805929    
## regionLosAngeles:year2016          -6.394e-02  4.736e-02  -1.350 0.176953    
## regionLouisville:year2016          -7.808e-02  4.736e-02  -1.649 0.099221 .  
## regionMiamiFtLauderdale:year2016   -9.894e-02  4.736e-02  -2.089 0.036692 *  
## regionMidsouth:year2016             4.327e-03  4.736e-02   0.091 0.927199    
## regionNashville:year2016           -1.562e-01  4.736e-02  -3.299 0.000971 ***
## regionNewOrleansMobile:year2016    -1.423e-02  4.736e-02  -0.301 0.763794    
## regionNewYork:year2016              1.223e-01  4.736e-02   2.583 0.009810 ** 
## regionNortheast:year2016            5.673e-02  4.736e-02   1.198 0.230946    
## regionNorthernNewEngland:year2016  -7.587e-02  4.736e-02  -1.602 0.109168    
## regionOrlando:year2016             -1.237e-01  4.736e-02  -2.613 0.008978 ** 
## regionPhiladelphia:year2016         1.244e-01  4.736e-02   2.627 0.008611 ** 
## regionPhoenixTucson:year2016        1.064e-01  4.736e-02   2.248 0.024607 *  
## regionPittsburgh:year2016          -5.904e-02  4.736e-02  -1.247 0.212525    
## regionPlains:year2016              -5.558e-02  4.736e-02  -1.174 0.240571    
## regionPortland:year2016            -1.104e-01  4.736e-02  -2.331 0.019767 *  
## regionRaleighGreensboro:year2016    3.173e-03  4.736e-02   0.067 0.946579    
## regionRichmondNorfolk:year2016     -5.856e-02  4.736e-02  -1.237 0.216273    
## regionRoanoke:year2016             -7.481e-02  4.736e-02  -1.580 0.114195    
## regionSacramento:year2016           2.189e-01  4.736e-02   4.623 3.80e-06 ***
## regionSanDiego:year2016             4.433e-02  4.736e-02   0.936 0.349267    
## regionSanFrancisco:year2016         2.650e-01  4.736e-02   5.596 2.23e-08 ***
## regionSeattle:year2016             -1.171e-01  4.736e-02  -2.473 0.013404 *  
## regionSouthCarolina:year2016       -1.449e-01  4.736e-02  -3.060 0.002217 ** 
## regionSouthCentral:year2016        -8.029e-02  4.736e-02  -1.695 0.090012 .  
## regionSoutheast:year2016           -1.230e-01  4.736e-02  -2.597 0.009413 ** 
## regionSpokane:year2016             -6.202e-02  4.736e-02  -1.310 0.190334    
## regionStLouis:year2016             -3.131e-01  4.736e-02  -6.611 3.92e-11 ***
## regionSyracuse:year2016            -2.077e-02  4.736e-02  -0.439 0.660973    
## regionTampa:year2016               -8.731e-02  4.736e-02  -1.844 0.065251 .  
## regionTotalUS:year2016              1.096e-02  4.736e-02   0.231 0.816951    
## regionWest:year2016                -5.212e-02  4.736e-02  -1.101 0.271127    
## regionWestTexNewMexico:year2016    -1.074e-02  4.741e-02  -0.226 0.820854    
## regionAtlanta:year2017             -5.088e-02  4.713e-02  -1.080 0.280337    
## regionBaltimoreWashington:year2017  2.115e-01  4.713e-02   4.488 7.25e-06 ***
## regionBoise:year2017                1.981e-02  4.713e-02   0.420 0.674245    
## regionBoston:year2017               1.069e-01  4.713e-02   2.268 0.023347 *  
## regionBuffaloRochester:year2017    -5.596e-02  4.713e-02  -1.187 0.235126    
## regionCalifornia:year2017           1.189e-01  4.713e-02   2.523 0.011639 *  
## regionCharlotte:year2017            9.496e-02  4.713e-02   2.015 0.043940 *  
## regionChicago:year2017              2.117e-01  4.713e-02   4.491 7.12e-06 ***
## regionCincinnatiDayton:year2017     1.805e-02  4.713e-02   0.383 0.701811    
## regionColumbus:year2017            -5.727e-02  4.713e-02  -1.215 0.224378    
## regionDallasFtWorth:year2017        1.633e-05  4.713e-02   0.000 0.999724    
## regionDenver:year2017               7.087e-02  4.713e-02   1.504 0.132705    
## regionDetroit:year2017             -9.829e-02  4.713e-02  -2.085 0.037040 *  
## regionGrandRapids:year2017          1.123e-01  4.713e-02   2.383 0.017189 *  
## regionGreatLakes:year2017          -7.076e-04  4.713e-02  -0.015 0.988023    
## regionHarrisburgScranton:year2017   2.504e-02  4.713e-02   0.531 0.595237    
## regionHartfordSpringfield:year2017  4.143e-02  4.713e-02   0.879 0.379365    
## regionHouston:year2017             -4.310e-02  4.713e-02  -0.914 0.360486    
## regionIndianapolis:year2017        -1.113e-01  4.713e-02  -2.362 0.018208 *  
## regionJacksonville:year2017         6.935e-02  4.713e-02   1.471 0.141188    
## regionLasVegas:year2017            -5.010e-02  4.713e-02  -1.063 0.287846    
## regionLosAngeles:year2017           1.258e-01  4.713e-02   2.669 0.007623 ** 
## regionLouisville:year2017          -3.643e-02  4.713e-02  -0.773 0.439599    
## regionMiamiFtLauderdale:year2017    1.549e-01  4.713e-02   3.287 0.001016 ** 
## regionMidsouth:year2017             7.014e-02  4.713e-02   1.488 0.136728    
## regionNashville:year2017           -1.363e-01  4.713e-02  -2.892 0.003836 ** 
## regionNewOrleansMobile:year2017     5.228e-02  4.713e-02   1.109 0.267311    
## regionNewYork:year2017              6.631e-02  4.713e-02   1.407 0.159498    
## regionNortheast:year2017            5.123e-02  4.713e-02   1.087 0.277109    
## regionNorthernNewEngland:year2017   4.724e-03  4.713e-02   0.100 0.920160    
## regionOrlando:year2017              8.178e-02  4.713e-02   1.735 0.082730 .  
## regionPhiladelphia:year2017         5.299e-02  4.713e-02   1.124 0.260891    
## regionPhoenixTucson:year2017        1.635e-02  4.713e-02   0.347 0.728647    
## regionPittsburgh:year2017          -1.434e-01  4.713e-02  -3.042 0.002355 ** 
## regionPlains:year2017              -2.649e-02  4.713e-02  -0.562 0.574052    
## regionPortland:year2017             2.843e-02  4.713e-02   0.603 0.546348    
## regionRaleighGreensboro:year2017    2.202e-01  4.713e-02   4.671 3.01e-06 ***
## regionRichmondNorfolk:year2017      2.565e-02  4.713e-02   0.544 0.586360    
## regionRoanoke:year2017              3.211e-02  4.713e-02   0.681 0.495754    
## regionSacramento:year2017           2.209e-01  4.713e-02   4.688 2.78e-06 ***
## regionSanDiego:year2017             2.112e-01  4.713e-02   4.481 7.46e-06 ***
## regionSanFrancisco:year2017         2.458e-01  4.713e-02   5.215 1.86e-07 ***
## regionSeattle:year2017              7.805e-02  4.713e-02   1.656 0.097751 .  
## regionSouthCarolina:year2017       -7.398e-02  4.713e-02  -1.570 0.116516    
## regionSouthCentral:year2017        -4.827e-02  4.713e-02  -1.024 0.305789    
## regionSoutheast:year2017           -1.622e-03  4.713e-02  -0.034 0.972549    
## regionSpokane:year2017              1.051e-01  4.713e-02   2.229 0.025817 *  
## regionStLouis:year2017             -1.065e-02  4.713e-02  -0.226 0.821183    
## regionSyracuse:year2017            -3.868e-02  4.713e-02  -0.821 0.411787    
## regionTampa:year2017                1.636e-01  4.713e-02   3.472 0.000519 ***
## regionTotalUS:year2017              8.012e-02  4.713e-02   1.700 0.089167 .  
## regionWest:year2017                 5.313e-02  4.713e-02   1.127 0.259636    
## regionWestTexNewMexico:year2017    -7.563e-02  4.730e-02  -1.599 0.109859    
## regionAtlanta:year2018              1.109e-02  7.733e-02   0.143 0.885972    
## regionBaltimoreWashington:year2018  1.124e-01  7.733e-02   1.454 0.146096    
## regionBoise:year2018                2.217e-01  7.733e-02   2.866 0.004156 ** 
## regionBoston:year2018               2.060e-01  7.733e-02   2.664 0.007725 ** 
## regionBuffaloRochester:year2018    -2.154e-01  7.733e-02  -2.786 0.005341 ** 
## regionCalifornia:year2018           1.983e-01  7.733e-02   2.564 0.010347 *  
## regionCharlotte:year2018            9.647e-03  7.733e-02   0.125 0.900720    
## regionChicago:year2018              2.605e-01  7.733e-02   3.369 0.000756 ***
## regionCincinnatiDayton:year2018     1.764e-01  7.733e-02   2.282 0.022523 *  
## regionColumbus:year2018             7.372e-04  7.733e-02   0.010 0.992394    
## regionDallasFtWorth:year2018        1.279e-01  7.733e-02   1.655 0.098035 .  
## regionDenver:year2018               1.960e-01  7.733e-02   2.534 0.011284 *  
## regionDetroit:year2018             -5.744e-02  7.733e-02  -0.743 0.457661    
## regionGrandRapids:year2018          1.490e-02  7.733e-02   0.193 0.847176    
## regionGreatLakes:year2018           5.500e-02  7.733e-02   0.711 0.476957    
## regionHarrisburgScranton:year2018  -3.205e-02  7.733e-02  -0.414 0.678539    
## regionHartfordSpringfield:year2018  3.263e-02  7.733e-02   0.422 0.673085    
## regionHouston:year2018              9.692e-02  7.733e-02   1.253 0.210099    
## regionIndianapolis:year2018        -7.173e-02  7.733e-02  -0.928 0.353643    
## regionJacksonville:year2018         5.651e-02  7.733e-02   0.731 0.464972    
## regionLasVegas:year2018             1.278e-01  7.733e-02   1.653 0.098372 .  
## regionLosAngeles:year2018           3.021e-01  7.733e-02   3.906 9.41e-05 ***
## regionLouisville:year2018           7.641e-02  7.733e-02   0.988 0.323126    
## regionMiamiFtLauderdale:year2018    6.353e-02  7.733e-02   0.821 0.411391    
## regionMidsouth:year2018             1.099e-01  7.733e-02   1.421 0.155277    
## regionNashville:year2018            4.821e-02  7.733e-02   0.623 0.533060    
## regionNewOrleansMobile:year2018     3.939e-02  7.733e-02   0.509 0.610495    
## regionNewYork:year2018              3.298e-02  7.733e-02   0.426 0.669761    
## regionNortheast:year2018            3.333e-02  7.733e-02   0.431 0.666443    
## regionNorthernNewEngland:year2018   5.080e-02  7.733e-02   0.657 0.511237    
## regionOrlando:year2018             -4.183e-02  7.733e-02  -0.541 0.588600    
## regionPhiladelphia:year2018        -3.526e-03  7.733e-02  -0.046 0.963637    
## regionPhoenixTucson:year2018        1.008e-01  7.733e-02   1.303 0.192425    
## regionPittsburgh:year2018          -2.888e-02  7.733e-02  -0.373 0.708831    
## regionPlains:year2018               2.462e-02  7.733e-02   0.318 0.750255    
## regionPortland:year2018             1.923e-01  7.733e-02   2.487 0.012884 *  
## regionRaleighGreensboro:year2018    1.885e-01  7.733e-02   2.437 0.014800 *  
## regionRichmondNorfolk:year2018      6.340e-02  7.733e-02   0.820 0.412336    
## regionRoanoke:year2018              1.616e-01  7.733e-02   2.090 0.036619 *  
## regionSacramento:year2018           1.210e-01  7.733e-02   1.564 0.117791    
## regionSanDiego:year2018             3.066e-01  7.733e-02   3.965 7.38e-05 ***
## regionSanFrancisco:year2018         3.144e-02  7.733e-02   0.407 0.684315    
## regionSeattle:year2018              1.357e-01  7.733e-02   1.755 0.079304 .  
## regionSouthCarolina:year2018       -8.346e-02  7.733e-02  -1.079 0.280485    
## regionSouthCentral:year2018         9.548e-02  7.733e-02   1.235 0.216963    
## regionSoutheast:year2018           -8.878e-03  7.733e-02  -0.115 0.908600    
## regionSpokane:year2018              1.275e-01  7.733e-02   1.649 0.099134 .  
## regionStLouis:year2018              6.538e-02  7.733e-02   0.846 0.397840    
## regionSyracuse:year2018            -1.757e-01  7.733e-02  -2.272 0.023093 *  
## regionTampa:year2018                7.712e-02  7.733e-02   0.997 0.318681    
## regionTotalUS:year2018              1.526e-01  7.733e-02   1.973 0.048481 *  
## regionWest:year2018                 1.622e-01  7.733e-02   2.098 0.035954 *  
## regionWestTexNewMexico:year2018     9.199e-02  7.737e-02   1.189 0.234465    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2415 on 18029 degrees of freedom
## Multiple R-squared:  0.6447, Adjusted R-squared:  0.6404 
## F-statistic: 149.4 on 219 and 18029 DF,  p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + quarter:year, data = trimmed_avocados)
summary(model5pf)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     quarter:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96042 -0.13634 -0.00203  0.13537  1.48398 
## 
## Coefficients: (3 not defined because of singularities)
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.259208   0.014541  86.600  < 2e-16 ***
## typeorganic                0.495932   0.003553 139.577  < 2e-16 ***
## regionAtlanta             -0.223077   0.018461 -12.084  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.018461  -1.452 0.146526    
## regionBoise               -0.212899   0.018461 -11.532  < 2e-16 ***
## regionBoston              -0.030148   0.018461  -1.633 0.102472    
## regionBuffaloRochester    -0.044201   0.018461  -2.394 0.016662 *  
## regionCalifornia          -0.165710   0.018461  -8.976  < 2e-16 ***
## regionCharlotte            0.045000   0.018461   2.438 0.014795 *  
## regionChicago             -0.004260   0.018461  -0.231 0.817490    
## regionCincinnatiDayton    -0.351834   0.018461 -19.058  < 2e-16 ***
## regionColumbus            -0.308254   0.018461 -16.698  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.018461 -25.754  < 2e-16 ***
## regionDenver              -0.342456   0.018461 -18.550  < 2e-16 ***
## regionDetroit             -0.284941   0.018461 -15.435  < 2e-16 ***
## regionGrandRapids         -0.056036   0.018461  -3.035 0.002406 ** 
## regionGreatLakes          -0.222485   0.018461 -12.052  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.018461  -2.587 0.009700 ** 
## regionHartfordSpringfield  0.257604   0.018461  13.954  < 2e-16 ***
## regionHouston             -0.513107   0.018461 -27.794  < 2e-16 ***
## regionIndianapolis        -0.247041   0.018461 -13.382  < 2e-16 ***
## regionJacksonville        -0.050089   0.018461  -2.713 0.006669 ** 
## regionLasVegas            -0.180118   0.018461  -9.757  < 2e-16 ***
## regionLosAngeles          -0.345030   0.018461 -18.690  < 2e-16 ***
## regionLouisville          -0.274349   0.018461 -14.861  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.018461  -7.180 7.25e-13 ***
## regionMidsouth            -0.156272   0.018461  -8.465  < 2e-16 ***
## regionNashville           -0.348935   0.018461 -18.901  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.018461 -13.880  < 2e-16 ***
## regionNewYork              0.166538   0.018461   9.021  < 2e-16 ***
## regionNortheast            0.040888   0.018461   2.215 0.026785 *  
## regionNorthernNewEngland  -0.083639   0.018461  -4.531 5.92e-06 ***
## regionOrlando             -0.054822   0.018461  -2.970 0.002985 ** 
## regionPhiladelphia         0.071095   0.018461   3.851 0.000118 ***
## regionPhoenixTucson       -0.336598   0.018461 -18.233  < 2e-16 ***
## regionPittsburgh          -0.196716   0.018461 -10.656  < 2e-16 ***
## regionPlains              -0.124527   0.018461  -6.745 1.57e-11 ***
## regionPortland            -0.243314   0.018461 -13.180  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.018461  -0.321 0.748575    
## regionRichmondNorfolk     -0.269704   0.018461 -14.609  < 2e-16 ***
## regionRoanoke             -0.313107   0.018461 -16.961  < 2e-16 ***
## regionSacramento           0.060533   0.018461   3.279 0.001044 ** 
## regionSanDiego            -0.162870   0.018461  -8.822  < 2e-16 ***
## regionSanFrancisco         0.243166   0.018461  13.172  < 2e-16 ***
## regionSeattle             -0.118462   0.018461  -6.417 1.43e-10 ***
## regionSouthCarolina       -0.157751   0.018461  -8.545  < 2e-16 ***
## regionSouthCentral        -0.459793   0.018461 -24.906  < 2e-16 ***
## regionSoutheast           -0.163018   0.018461  -8.830  < 2e-16 ***
## regionSpokane             -0.115444   0.018461  -6.253 4.11e-10 ***
## regionStLouis             -0.130414   0.018461  -7.064 1.67e-12 ***
## regionSyracuse            -0.040710   0.018461  -2.205 0.027452 *  
## regionTampa               -0.152189   0.018461  -8.244  < 2e-16 ***
## regionTotalUS             -0.242012   0.018461 -13.109  < 2e-16 ***
## regionWest                -0.288817   0.018461 -15.645  < 2e-16 ***
## regionWestTexNewMexico    -0.296594   0.018502 -16.030  < 2e-16 ***
## quarter2                   0.021204   0.009058   2.341 0.019248 *  
## quarter3                   0.082991   0.009058   9.162  < 2e-16 ***
## quarter4                  -0.010357   0.009060  -1.143 0.252944    
## year2016                  -0.117821   0.009058 -13.007  < 2e-16 ***
## year2017                  -0.056574   0.009058  -6.246 4.31e-10 ***
## year2018                  -0.004613   0.009245  -0.499 0.617792    
## quarter2:year2016         -0.028533   0.012810  -2.227 0.025932 *  
## quarter3:year2016          0.095192   0.012810   7.431 1.12e-13 ***
## quarter4:year2016          0.256768   0.012811  20.043  < 2e-16 ***
## quarter2:year2017          0.208350   0.012812  16.262  < 2e-16 ***
## quarter3:year2017          0.312536   0.012810  24.398  < 2e-16 ***
## quarter4:year2017          0.261262   0.012696  20.578  < 2e-16 ***
## quarter2:year2018                NA         NA      NA       NA    
## quarter3:year2018                NA         NA      NA       NA    
## quarter4:year2018                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.24 on 18182 degrees of freedom
## Multiple R-squared:  0.6461, Adjusted R-squared:  0.6448 
## F-statistic: 502.9 on 66 and 18182 DF,  p-value: < 2.2e-16


So it looks like model5pa with the type, region, quarter, year, and type:region is the best, with a moderate gain in multiple-\(r^2\) due to the interaction. However, we need to test for the significance of the interaction given the various \(p\)-values of the associated coefficients

anova(model5, model5pa)

Neat, it looks like including the interaction is statistically justified. So we can keep it in. And our final model is:

average_price ~ type + region + quarter + year + type:region


3 Automated approach : leaps

If you wanted to do a predictive (automatic) model, you could follow the same process, using the following code:

library(leaps)

regsubsets_forward <- regsubsets(average_price ~ ., 
                                 data = trimmed_avocados, 
                                 nvmax = 12,
                                 method = "forward")

plot(regsubsets_forward)


From the plot, it seems like the best performing model has type, year, region (although not all of them are included), and quarter, although again, not all of them are included here.

We can then plot the BIC score:


# See what's in model
plot(summary(regsubsets_forward)$bic, type = "b")

From this, it seems like the BIC score doesn’t really get that much lower after including 8 different variables. We can check which variables these are:


summary(regsubsets_forward)$which[8, ]
##               (Intercept)              total_volume                     x4046 
##                      TRUE                     FALSE                     FALSE 
##                small_bags                large_bags              x_large_bags 
##                     FALSE                     FALSE                     FALSE 
##               typeorganic                  year2016                  year2017 
##                      TRUE                     FALSE                      TRUE 
##                  year2018             regionAtlanta regionBaltimoreWashington 
##                     FALSE                     FALSE                     FALSE 
##               regionBoise              regionBoston    regionBuffaloRochester 
##                     FALSE                     FALSE                     FALSE 
##          regionCalifornia           regionCharlotte             regionChicago 
##                     FALSE                     FALSE                     FALSE 
##    regionCincinnatiDayton            regionColumbus       regionDallasFtWorth 
##                     FALSE                     FALSE                     FALSE 
##              regionDenver             regionDetroit         regionGrandRapids 
##                     FALSE                     FALSE                     FALSE 
##          regionGreatLakes  regionHarrisburgScranton regionHartfordSpringfield 
##                     FALSE                     FALSE                      TRUE 
##             regionHouston        regionIndianapolis        regionJacksonville 
##                      TRUE                     FALSE                     FALSE 
##            regionLasVegas          regionLosAngeles          regionLouisville 
##                     FALSE                     FALSE                     FALSE 
##   regionMiamiFtLauderdale            regionMidsouth           regionNashville 
##                     FALSE                     FALSE                     FALSE 
##    regionNewOrleansMobile             regionNewYork           regionNortheast 
##                     FALSE                      TRUE                     FALSE 
##  regionNorthernNewEngland             regionOrlando        regionPhiladelphia 
##                     FALSE                     FALSE                     FALSE 
##       regionPhoenixTucson          regionPittsburgh              regionPlains 
##                     FALSE                     FALSE                     FALSE 
##            regionPortland   regionRaleighGreensboro     regionRichmondNorfolk 
##                     FALSE                     FALSE                     FALSE 
##             regionRoanoke          regionSacramento            regionSanDiego 
##                     FALSE                     FALSE                     FALSE 
##        regionSanFrancisco             regionSeattle       regionSouthCarolina 
##                      TRUE                     FALSE                     FALSE 
##        regionSouthCentral           regionSoutheast             regionSpokane 
##                     FALSE                     FALSE                     FALSE 
##             regionStLouis            regionSyracuse               regionTampa 
##                     FALSE                     FALSE                     FALSE 
##             regionTotalUS                regionWest    regionWestTexNewMexico 
##                     FALSE                     FALSE                     FALSE 
##                  quarter2                  quarter3                  quarter4 
##                     FALSE                      TRUE                      TRUE

Given the ones that are true, best model includes type, year, some regions and some quarters. We can include type and year in our model, and then test whether quarter and region can be added.

# test if we should put regions in
mod_type_year <- lm(average_price ~ type + year, data = trimmed_avocados)
mod_type_region <- lm(average_price ~ type + year + region, data = trimmed_avocados)
anova(mod_type_year, mod_type_region)
# yep, it's significant so we can put that in. 
# test if we should put year in
mod_type_year <- lm(average_price ~ type + year, data = trimmed_avocados)
mod_type_quarter <- lm(average_price ~ type + year + quarter, data = trimmed_avocados)
anova(mod_type_year, mod_type_quarter)
# yep, it's significant so we can put that in. 
# now let's test if the one with region and quarter is different than the one with just region

mod_type_region_quarter <- lm(average_price ~ type + year + region + quarter, data = trimmed_avocados)
anova(mod_type_region_quarter, mod_type_region)
# Yep, that's significant to I would leave it in. 

You can continue to test your interactions in the same way as we did during the manual version above.

4 Optional Extra: Automated approach, glmulti()


We didn’t use this in class, but if you were interested in how to do model builing with glmulti(), you can run the code below.


Automated approach : glmulti()

library(glmulti)


This data is pretty big for glmulti on a single CPU core, so we’ll likely not be able to do a search simultaneously for both main effects and pairwise interactions. Let’s look first for the best main effects model using BIC as our metric:


# we're putting set.seed() in here for reproducibility, but you shouldn't include
# this in production code
set.seed(42)
n_data <- nrow(trimmed_avocados)
test_index <- sample(1:n_data, size = n_data * 0.2)

test  <- slice(trimmed_avocados, test_index)
train <- slice(trimmed_avocados, -test_index)

# sanity check
nrow(test) + nrow(train) == n_data
nrow(test)
nrow(train)
glmulti_fit <- glmulti(
  average_price ~ ., 
  data = train,
  level = 1, # 2 = include pairwise interactions, 1 = main effects only (main effect = no pairwise interactions)
  minsize = 1, # no min size of model
  maxsize = -1, # -1 = no max size of model
  marginality = TRUE, # marginality here means the same as 'strongly hierarchical' interactions, i.e. include pairwise interactions only if both predictors present in the model as main effects.
  method = "h", # try exhaustive search, or could use "g" for genetic algorithm instead
  crit = bic, # criteria for model selection is BIC value (lower is better)
  plotty = FALSE, # don't plot models as function runs
  report = TRUE, # do produce reports as function runs
  confsetsize = 10, # return best 10 solutions
  fitfunction = lm # fit using the `lm` function
)
summary(glmulti_fit)


So the lowest BIC model with main effects is average_price ~ type + year + quarter + total_volume + x_large_bags + region. Let’s have a look at possible extensions to this. We’re going to deliberately try to go to the point where models start to overfit (as tested by the RMSE on the test set), so we’ve seen what this looks like.


results <- tibble(
  name = c(), bic = c(), rmse_train = c(), rmse_test = c()
)


# lowest BIC model with main effects
lowest_bic_model <- lm(average_price ~ type + year + quarter + total_volume + x_large_bags + region, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "lowest bic", 
      bic = bic(lowest_bic_model),
      rmse_train = rmse(lowest_bic_model, train),
      rmse_test = rmse(lowest_bic_model, test)
    )
  )

# try adding in all possible pairs with these main effects
lowest_bic_model_all_pairs <- lm(average_price ~ (type + year + quarter + total_volume + x_large_bags + region)^2, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "lowest bic all pairs", 
      bic = bic(lowest_bic_model_all_pairs),
      rmse_train = rmse(lowest_bic_model_all_pairs, train),
      rmse_test = rmse(lowest_bic_model_all_pairs, test)
    )
  )

# try a model with all main effects
model_all_mains <- lm(average_price ~ ., data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all mains", 
      bic = bic(model_all_mains),
      rmse_train = rmse(model_all_mains, train),
      rmse_test = rmse(model_all_mains, test)
    )
  )

# try a model with all main effects and all pairs
model_all_pairs <- lm(average_price ~ .^2, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs", 
      bic = bic(model_all_pairs),
      rmse_train = rmse(model_all_pairs, train),
      rmse_test = rmse(model_all_pairs, test)
    )
  )

# try a model with all main effects, all pairs and one triple (this is getting silly)
model_all_pairs_one_triple <- lm(average_price ~ .^2 + region:type:year, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs one triple",
      bic = bic(model_all_pairs_one_triple),
      rmse_train = rmse(model_all_pairs_one_triple, train),
      rmse_test = rmse(model_all_pairs_one_triple, test)
    )
  )

# try a model with all main effects, all pairs and multiple triples (more silly)
model_all_pairs_multi_triples <- lm(average_price ~ .^2 + region:type:year + region:type:quarter + region:year:quarter, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs multi triples",
      bic = bic(model_all_pairs_multi_triples),
      rmse_train = rmse(model_all_pairs_multi_triples, train),
      rmse_test = rmse(model_all_pairs_multi_triples, test)
    )
  )
results <- results %>%
  pivot_longer(cols = bic:rmse_test, names_to = "measure", values_to = "value") %>%
  mutate(
    name = fct_relevel(
      as_factor(name),
      "lowest bic", "all mains", "lowest bic all pairs", "all pairs", "all pairs one triple", "all pairs multi triples"
    )
  )
results %>%
  filter(measure == "bic") %>%
  ggplot(aes(x = name, y = value)) +
  geom_col(fill = "steelblue", alpha = 0.7) +
  labs(
    x = "model",
    y = "bic"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_hline(aes(yintercept = 0))


BIC is telling us here that if we took our main effects model with lowest BIC, and added in all possible pairs, this would likely still improve the model for predictive purposes. BIC suggests that this ‘lowest BIC all pairs’ model will offer best predictive performance without overfitting, with all other models being significantly poorer.

Let’s compare the RMSE values of the various models for train and test sets. We expect train RMSE always to go down as model complexity increases, but what happens to the test RMSE as models get more complex?


results %>%
  filter(measure != "bic") %>%
  ggplot(aes(x = name, y = value, fill = measure)) +
  geom_col(position = "dodge", alpha = 0.7) +
  labs(
    x = "model",
    y = "rmse"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))


Lowest RMSE in test is obtained for the ‘lowest bic all pairs’ model, and it increases thereafter for the more complex models, which suggests that these models are overfitting the training data.